example for rigorous two phase locking - database

The image below shows an example of an S2PL transaction, can anyone convert this example to R2PL?

The differences between S2PL and R2PL is really only during the second phase, i.e. how they release locks.
For S2PL, the locks for a transaction must hold all its locks till it commits or aborts, while R2PL releases all locks only after commit or abort.
So, to convert it to R2PL, you just have to move the unlock(A) to after the commit point, and before unlock(B).

Related

Difference between 2PC (2 phase commit) and 2 PL (2 phase locking)

What is the difference between the two? The protocol on the surface looks different, but I would like to understand what is really different between the two and why they are not equivalent.
2 Phase locking is a mechanism implemented within a single database instance to achieve serializeable isolation level. Serializeable transaction level is the strongest isolation where even with parallely executing transactions, the end result is same as if the transactions where executed serially. It works as follows:
Whenever the transaction wants to update an object/row, it must acquire a write/exclusive lock. When transactions wants to read an object/row, it must acquire a read/shared lock. Instead of releasing the lock immediately after each query, the locks must be held till the end of the transaction(commit or abort). So while the transaction is being executed, the number of locks held by the transaction expand/grow. (Read/write lock behavior is similar to any other reader/writer locking mechanisms, so not discussing here)
At the end of the transaction, the locks are released and number of locks held by the transactions shrinks.
Since the locks are acquired in one phase and released in another phase i.e., there are no lock releases in acquire phase and no new lock acquire in release phase, this is called 2 phase locking.
2 phase commit is an algorithm for implementing distributed transaction across multiple database instances to ensure all nodes either commit or abort the transaction.
It works by having coordinator(could be a separate service or library within the application initiating the transaction) issue two requests - PREPARE to all nodes in phase 1 and COMMIT(if all nodes returned OK in PREPARE phase) or ABORT(if any node returned NOT OK in PREPARE PHASE) to all nodes in phase 2.
TLDR:
2 phase locking - for serializable isolation within a single database instance
2 phase commit - atomic commit across multiple nodes of a distributed database/datastores

Why we need Rigorous 2PL since we have Strict 2PL? [duplicate]

In 2PL (two phase locking), what advantage(s) does the rigorous model have over the strict model?
I) There is no advantage over the strict model.
II) In contrast to the strict model, it guarantees that starvation cannot occur.
III) In contrast to the strict model, it guarantees that deadlock cannot occur.
IV) In contrast to the strict model, there is no need to predict data needed in the future.
My note says all of the above are false. I am a bit confused. Can someone clarify for me why all of this is false?
What is Two-Phase Locking (2PL) Protocol ?
A transaction is two-phase locked if:
before reading x, it sets a read lock on x
before writing x, it sets a write lock on x
it holds each lock until after it executes the corresponding operation
after its first unlock operation, it requests no new locks
Now, what is Strict phase locking ?
Here a transaction must hold all its exclusive locks till it commits/aborts.
But ,whats rigorous 2PL ?
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.
Much deeper :
Strict 2PL :
Same as 2PL but Hold all exclusive locks until the transaction has already successfully committed or aborted. –It guarantees cascadeless recoverability
Rigorous 2PL :
Same as Strict 2PL but Hold all locks until the transaction has already successfully committed or aborted. –It is used in dynamic environments
where data access patterns are not known before hand.
There is no deadlock. Also, a younger transaction requesting an item held by an
older transaction is aborted and restart with the same timestamp, starvation is avoided.
I hope that above
clear explanations with diagram must have made you clear about the
concept and advantages of rigorous over the other.
Thanks
I - there is an advantage
Look at this lecture note from UCLA:
Rigorous two-phase locking has the advantages of strict 2PL. In addition
it has the property that for two conflicting transactions, their commit order
is their serializability order. In some systems users might expect this behavior.
These lecture notes have an example (the model in the example is strict - not rigorous):
Consider two transactions conducted at the same site in which a long running transaction T1 which reads x is ordered before a short transaction T2 that writes x. T2 returns first, showing an update version of x long before T1 completes based on the old version.
II and III - does not affect deadlocks/starvation
Rigorous 2PL means that all locks are released after the transaction ends as opposed to strict where read-only locks may be released earlier. This doesn't affect deadlocks or starvation as those occur in the expanding phase (a transaction cannot acquire the needed lock). In a deadlock both processes are always in the expanding phase.
IV - both need to know the needed data for locking in the expanding phase - shrinking phase varies
Strict: I don't know the usual implementation details of strict 2PL but if a read lock is released before a transaction ends there has to be a knowledge (100% sure prediction if you like) that the lock is not needed later in the transaction.
Rigorous: All the read locks are released at the end of a transaction and the transaction never has to evaluate if it should release a read lock or keep it for later reads in the transaction.
Is rigorous or strict more used/preferred?
Which of those two models to use would depend on the situation. Modern DBMS use more complex concurrency handling than simple rigorous or strict 2PL. Having said that judging by the Wikipedia article on two-phase locking the rigorous (SS2PL) model is more widely used:
SS2PL [rigorous] has been the concurrency control protocol of choice for most database systems and utilized since their early days in the 1970s. [...]
2PL in its general form, as well as when combined with Strictness, i.e., Strict 2PL (S2PL), are not known to be utilized in practice. The popular SS2PL does not require marking "end of phase-1" as 2PL and S2PL do, and thus is simpler to implement. Also, unlike the general 2PL, SS2PL provides, as mentioned above, the useful Strictness and Commitment ordering properties. [...]
SS2PL Vs. S2PL: Both provide Serializability and Strictness. Since S2PL is a super class of SS2PL it may, in principle, provide more concurrency. However, no concurrency advantage is typically practically noticed (exactly same locking exists for both, with practically not much earlier lock release for S2PL), and the overhead of dealing with an end-of-phase-1 mechanism in S2PL, separate from transaction-end, is not justified. Also, while SS2PL provides Commitment ordering, S2PL does not. This explains the preference of SS2PL over S2PL. [...]
Transaction T2 in the above example does not follow 2PL and S2PL as a lock request (lock B) is done after the lock release unlock(A) - hence violating the protocol.
Rigorous two phase locking is similar to strict two phase locking with two major differences:
In strict two phase locking the shared locks are released in
shrinking phase, but in rigorous two phase locking all the shared
and exclusive locks are kept until the end of the transaction.
In rigorous two phase locking we do not need to know the access
pattern of locks on data items beforehand so it is more appropriate
for dynamic environments while in strict two phase locking the
access pattern of locks should be specified at the start of
transaction.
So the fourth option is the correct one.

Prioritizing Transactions in Google AppEngine

Let's say I need to perform two different kinds write operations on a datastore entity that might happen simultaneously, for example:
The client that holds a write-lock on the entry updates the entry's content
The client requests a refresh of the write-lock (updates the lock's expiration time-stamp)
As the content-update operation is only allowed if the client holds the current write-lock, I need to perform the lock-check and the content-write in a transaction (unless there is another way that I am missing?). Also, a lock-refresh must happen in a transaction because the client needs to first be confirmed as the current lock-holder.
The lock-refresh is a very quick operation.
The content-update operation can be quite complex. Think of it as the client sending the server a complicated update-script that the server executes on the content.
Given this, if there is a conflict between those two transactions (should they be executed simultaneously), I would much rather have the lock-refresh operation fail than the complex content-update.
Is there a way that I can "prioritize" the content-update transaction? I don't see anything in the docs and I would imagine that this is not a specific feature, but maybe there is some trick I can use?
For example, what happens if my content-update reads the entry, writes it back with a small modification (without committing the transaction), then performs the lengthy operation and finally writes the result and commits the transaction? Would the first write be applied immediately and cause a simultaneous lock-refresh transaction to fail? Or are all writes kept until the transaction is committed at the end?
Is there such a thing as keeping two transactions open? Or doing an intermediate commit in a transaction?
Clearly, I can just split my content-update into two transactions: The first one sets a "don't mess with this, please!"-flag and the second one (later) writes the changes and clears that flag.
But maybe there is some other trick to achieve this with fewer reads/writes/transactions?
Another thought I had was that there are 3 different "blocks" of data: The current lock-holder (LH), the lock expiration (EX), and the content that is being modified (CO). The lock-refresh operation needs to perform a read of LH and a write to EX in a transaction, while the content-update operation needs to perform a read of LH, a read of CO, and a write of CO in a transaction. Is there a way to break the data apart into three entities and somehow have the transactions span only the needed entities? Since LH is never modified by these two operations, this might help avoid the conflict in the first place?
The datastore uses optimistic concurrency control, which means that a (datastore primitive) transaction waits until it is committed, then succeeds only if someone else hasn't committed first. Typically, the app retries the failed transaction with fresh data. There is no way to modify this first-wins behavior.
It might help to know that datastore transactions are strongly consistent, so a client can first commit a lock refresh with a synchronous datastore call, and when that call returns, the client knows for sure whether it obtained or refreshed the lock. The client can then proceed with its update and lock clear. The case you describe where a lock refresh and an update might occur concurrently from the same client sounds avoidable.
I'm assuming you need the lock mechanism to prevent writes from other clients while the lock owner performs multiple datastore primitive transactions. If a client is actually only doing one update before it releases the lock and it can do so within seconds (well before the datastore RPC timeout), you might get by with just a primitive datastore transaction with optimistic concurrency control and retries. But a lock might be a good idea for simple serialization of, say, edits to a record in a user interface, where a user hits an "edit" button in a UI and you want that to guarantee that the user has some time to prepare and submit changes without the record being changed by someone else. (Whether that's the user experience you want is your decision. :) )

Oracle deadlock without explicit locking and read committed isolation level, why?

I get this error Message: ORA-00060: deadlock detected while waiting for resource even though I am not using any explicit table locking and my isolation level is set to READ COMMITTED.
I use multiple threads over the Spring TransactionTemplate with default propagation. In my business logic the data is separated so that two transaction will never have the same set of data. Therefor I don't need SERIALIZABLE
Why can Oracle detect a deadlock? Deadlocks are impossible in this constellation, or am I missing something? If I'm not missing anything then my separation algorithm must be wrong, right? Or could there be some other explaination?
Oracle by default does row level locking. You mention using multiple threads. I suspect one thread is locking one row then attempting to lock another which has been locked by another thread. That other thread is then attempting to lock the row the first thread locked. At this point, Oracle will automatically detect a deadlock and break it. The two rows mentioned above could be in the same table or in different tables.
A careful review of what each thread is doing is the starting point. It may be necessary to decide to not run things in parallel, or it may be necessary to use an explicit locking mechanism (select for update for example).
LMK of what you find and of any additional questions….
K
Encountering deadlocks has nothing to do per se with the serialization level. When a row is inserted/updated/deleted oracle locks the row. If you have two transactions running concurrently and trying to change the same row, you can encounter a deadlock. The emphasis in on "CAN". This generally happens if different type of transactions take locks in a different order, which is a sign of bad transaction design.
As was previously mentioned a trace file is generated on encountering a deadlock. If you look at the trace file, you can determine which two sessions are involved in the deadlock. In addition it also shows the respective SQL statements.

To NOLOCK or NOT to NOLOCK, that is the question

This is really more of a discussion than a specific question about nolock.
I took over an app recently that almost every query (and there are lots of them) has the nolock option on them. Now I am pretty new to SQL server (used Oracle for 10 years) but yet I find this pretty disturbing. So this weekend I was talking with one of my friends who runs a rather large ecommerce site (name will be withheld to protect the guilty) and he says he has to do this with all of his SQL servers cause he will always end in deadlocks.
Is this just a huge short fall with SQL server? Is this just a failure in the DB design (mine is not 3rd level, but its close) Is anybody out there running an SQL server app without nolocks? These are issues that Oracle handles better with more grandulare recordlocks.
Is SQL server just not able to handle big loads? Is there some better workaround than reading uncommited data? I would love to hear what people think.
Thanks
SQL Server has added snapshot isolation in SQL Server 2005, this will enable you to still read the latest correct value without having to wait for locks. StackOverflow is also using Snapshot Isolation. The Snapshot Isolation level is more or less the same that Oracle uses, this is why deadlocks are not very common on an Oracle box. Just be aware to have plenty of tempdb space if you do enable it
from Books On Line
When the READ_COMMITTED_SNAPSHOT
database option is set ON, read
committed isolation uses row
versioning to provide statement-level
read consistency. Read operations
require only SCH-S table level locks
and no page or row locks. When the
READ_COMMITTED_SNAPSHOT database
option is set OFF, which is the
default setting, read committed
isolation behaves as it did in earlier
versions of SQL Server. Both
implementations meet the ANSI
definition of read committed
isolation.
If somebody says that without NOLOCK their application always gets deadlocked, then there is (more than likely) a problem with their queries. A deadlock means that two transactions cannot proceed because of resource contention and the problem cannot be resolved. An example:
Consider Transactions A and B. Both are in-flight. Transaction A has inserted a row into table X and Transaction B has inserted a row into table Y, so Transaction A has an exclusive lock on X and Transaction B has an exclusive lock on Y.
Now, Transaction A needs run a SELECT against table Y and Transaction B needs to run a SELECT against table X.
The two transactions are deadlocked: A needs resource Y and B needs resource X. Since neither transaction can proceed until the other completes, the situtation cannot be resolved: neither transactions demand for a resource may be satisified until the other transaction releases its lock on the resource in contention (either by ROLLBACK or COMMIT, doesn't matter.)
SQL Server identifies this situation and select one transaction or the other as the deadlock victim, aborts that transaction and rolls back, leaving the other transaction free to proceed to its presumable completion.
Deadlocks are rare in real life (IMHO). One rectifies them by
ensuring that transaction scope is as small as possible, something SQL server does automatically (SQL Server's default transaction scope is a single statement with an implicit COMMIT), and
ensuring that transactions access resources in the same sequence. In the example above, if transactions A and B both locked resources X and Y in the same sequence, there would not be a deadlock.
Timeouts
A timeout, on the other hand, occurs when a transaction exceeds its wait time and is rolled back due to resource contention. For instance, Transaction A needs resource X. Resource X is locked by Transaction B, so Transaction A waits for the lock to be released. If the lock isn't released within the queries timeout limimt, the waiting transaction is aborted and rolled back. Every query has a query timeout associated with it (the default value is 30s, I believe), after which time the transaction is aborted and rolled back. The query timeout can be set to 0s, in which case SQL Server will let the query wait forever.
This is probably what they are talking about. In my experience, timeouts like this usually occur in big databases when large batch jobs are updating thousands and thousands of records in a single transaction, although they can happen because a transaction goes to long (connect to your production database in Query Abalyzer, execute BEGIN TRANSACTION, update a single row in a frequently hit table in Query Analyzer and go to lunch without executing ROLLBACK or COMMIT TRANSACTION and see how long it takes for the production DBAs to go apes**t on you. Don't ask me how I know this)
This sort of timeout is usually what results in splattering perfectly innocent SQL with all sorts of NOLOCK hints
[TIP: if your going to do that, just execute SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED as the first statement in your stored procedure and have done with it.]
The problem with this approach (NOLOCK/READ UNCOMMITTED) is that you can read uncommitted data from other transaction: stuff that is incomplete or that may get rolled back later, so your data integrity is comprimised. You might be sending out a bill based on data with a high level of bogosity.
My general rule is that one should avoid the use of table hints insofar as possible. Let SQL Server and its query optimizer do their jobs.
The right way to avoid this sort of issue is to avoid the sort of transactions (insert a million rows all at one fell swoop, for instance) that cause the problems. The locking strategy implicit in relational database SQL is designed around small transactions of short scope. Lock should be small in scope and short in duration. Think "bank teller updating somebody's checking account with a deposit." as the underlying use case. Design your processes to work in that model and you'll be much happier all the way 'round.
Instead of inserting a million rows in one mondo insert statement, do the work in independent chunks and commit each chunk independently. If your million row insert dies after processing 999,000 rows, all the work done is lost (not to mention that the rollback can be a b*tch, and the table is still locked during rollback as well.) If you insert the million rows in block of 1000 rows each, committing after each block, you avoid the lock contention that causes deadlocks, as locks will be obtained and released and things will keep moving. If something goes south in the 999th block of 1000 rows, and the transaction get aborted and rolled back, you've still gotten 998,000 rows inserted; you've only lost 1000 rows of work. Restart/Retry is much easier.
Also, lock escalation occurs in large transactions. For effiency, locks escalate to larger and larger scope as the number of locks held by transaction increases. If a single transaction inserts/updates/deletes a single row in a table, I get a row lock. Keep doing that and once the number of row locks held by that transaction against that table hits a threshold value, SQL Server will escalate the locking strategy: the row locks will be consolidated and converted into a smaller number page locks, thus increasing the scope of the locks held. From that point forward, an insert/delete/update of a single row will lock that page in the table. Once the number of page locks held hits its threshold value, the page locks are again consolidated and the locking strategy escalates to table locks: the transaction now locks the entire table and nobody else may play until the transaction commits or rolls back.
Whether you can avoid functionally avoid the use of NOLOCK/READ UNCOMMITTED is entirely dependent on the nature of the processes hitting the underlying database (and the culture of the organization owning it).
Myself, I try to avoid its use as much as possible.
Hope this helps.
No, there is no need to use NOLOCK. Links: SO 1
As for load, we deal with 2000 rows per second which is small change compared to 35k TPS
Deadlocks are caused by lock contention and usually caused by inconsistent write order on tables in transactions. ORMs especially are rubbish at this. We get them very infrequently. A well written DAL should retry too as per MSDN.
In a traditional normalized OLTP environment, NOLOCK is a code smell and almost certainly unnecessary in a properly designed system.
In a dimensional model, I used NOLOCK extensively to avoid locking very large fact and dimension tables which were being populated with later fact data (and dimensions may have been expiring). In the dimensional model, the facts either never change or never change after a certain point. Similarly, any dimension which is referenced will also be static, so for example, the NOLOCK will stop your long analysis operation on yesterday's data from blocking a dimension expiration during a data load for today's data.
You should only use nolock on an unchanging table. Of course, this will be the same then as Read Committed Snapshot. Without the snapshot, you are only saving the time it takes to apply a shared lock, and then to remove it, which for most cases isn't necessary.
As for a changing table... No lock doesn't just mean getting a row before a transaction is done updating all of its rows. You can get ghost data as data pages split, or even index pages split. Or no data. That alone scared me away, but I think there may be even more scenarios where you simply get the wrong data.
Of course, nolock for getting rough estimates or to just check in on a process might be reasonable.
Basic rule of thumb -- if you care about the data at all, and the data is changing, then do not use NoLOCK.

Resources