Why we need Rigorous 2PL since we have Strict 2PL? [duplicate] - database

In 2PL (two phase locking), what advantage(s) does the rigorous model have over the strict model?
I) There is no advantage over the strict model.
II) In contrast to the strict model, it guarantees that starvation cannot occur.
III) In contrast to the strict model, it guarantees that deadlock cannot occur.
IV) In contrast to the strict model, there is no need to predict data needed in the future.
My note says all of the above are false. I am a bit confused. Can someone clarify for me why all of this is false?

What is Two-Phase Locking (2PL) Protocol ?
A transaction is two-phase locked if:
before reading x, it sets a read lock on x
before writing x, it sets a write lock on x
it holds each lock until after it executes the corresponding operation
after its first unlock operation, it requests no new locks
Now, what is Strict phase locking ?
Here a transaction must hold all its exclusive locks till it commits/aborts.
But ,whats rigorous 2PL ?
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.
Much deeper :
Strict 2PL :
Same as 2PL but Hold all exclusive locks until the transaction has already successfully committed or aborted. –It guarantees cascadeless recoverability
Rigorous 2PL :
Same as Strict 2PL but Hold all locks until the transaction has already successfully committed or aborted. –It is used in dynamic environments
where data access patterns are not known before hand.
There is no deadlock. Also, a younger transaction requesting an item held by an
older transaction is aborted and restart with the same timestamp, starvation is avoided.
I hope that above
clear explanations with diagram must have made you clear about the
concept and advantages of rigorous over the other.
Thanks

I - there is an advantage
Look at this lecture note from UCLA:
Rigorous two-phase locking has the advantages of strict 2PL. In addition
it has the property that for two conflicting transactions, their commit order
is their serializability order. In some systems users might expect this behavior.
These lecture notes have an example (the model in the example is strict - not rigorous):
Consider two transactions conducted at the same site in which a long running transaction T1 which reads x is ordered before a short transaction T2 that writes x. T2 returns first, showing an update version of x long before T1 completes based on the old version.
II and III - does not affect deadlocks/starvation
Rigorous 2PL means that all locks are released after the transaction ends as opposed to strict where read-only locks may be released earlier. This doesn't affect deadlocks or starvation as those occur in the expanding phase (a transaction cannot acquire the needed lock). In a deadlock both processes are always in the expanding phase.
IV - both need to know the needed data for locking in the expanding phase - shrinking phase varies
Strict: I don't know the usual implementation details of strict 2PL but if a read lock is released before a transaction ends there has to be a knowledge (100% sure prediction if you like) that the lock is not needed later in the transaction.
Rigorous: All the read locks are released at the end of a transaction and the transaction never has to evaluate if it should release a read lock or keep it for later reads in the transaction.
Is rigorous or strict more used/preferred?
Which of those two models to use would depend on the situation. Modern DBMS use more complex concurrency handling than simple rigorous or strict 2PL. Having said that judging by the Wikipedia article on two-phase locking the rigorous (SS2PL) model is more widely used:
SS2PL [rigorous] has been the concurrency control protocol of choice for most database systems and utilized since their early days in the 1970s. [...]
2PL in its general form, as well as when combined with Strictness, i.e., Strict 2PL (S2PL), are not known to be utilized in practice. The popular SS2PL does not require marking "end of phase-1" as 2PL and S2PL do, and thus is simpler to implement. Also, unlike the general 2PL, SS2PL provides, as mentioned above, the useful Strictness and Commitment ordering properties. [...]
SS2PL Vs. S2PL: Both provide Serializability and Strictness. Since S2PL is a super class of SS2PL it may, in principle, provide more concurrency. However, no concurrency advantage is typically practically noticed (exactly same locking exists for both, with practically not much earlier lock release for S2PL), and the overhead of dealing with an end-of-phase-1 mechanism in S2PL, separate from transaction-end, is not justified. Also, while SS2PL provides Commitment ordering, S2PL does not. This explains the preference of SS2PL over S2PL. [...]

Transaction T2 in the above example does not follow 2PL and S2PL as a lock request (lock B) is done after the lock release unlock(A) - hence violating the protocol.

Rigorous two phase locking is similar to strict two phase locking with two major differences:
In strict two phase locking the shared locks are released in
shrinking phase, but in rigorous two phase locking all the shared
and exclusive locks are kept until the end of the transaction.
In rigorous two phase locking we do not need to know the access
pattern of locks on data items beforehand so it is more appropriate
for dynamic environments while in strict two phase locking the
access pattern of locks should be specified at the start of
transaction.
So the fourth option is the correct one.

Related

Database locks and isolation level in JPA

I am not sure in understanding the Database Locks. I am using the repeatable read isolation level. According to wikipedia it keeps read and write locks (acquired on selected data) until the end of the transaction.
Let's consider the following scenario: "Let's have two threads A, B. Thread A begins a transaction. Let's say thread A retrieves a list of all users from table User. (I am expecting here that: Thread A acquired read&write locks on all users ??) Thread B begins another transaction, retrieves one concrete User u from table User and updates the User u then commits the transaction (Since A acquired the locks, does the Thread B has to wait until A commits the transaction ??)"
Is the describes behavior to expect if using JPA ?
Is the lock acquired if the Thread A reads the users outside a transaction (Let's say if I am using the Extended Peristence Context) ??
You are confusing the logical isolation level with its physical implementation. The SQL standard defines the four isolation levels Serializable, Repeatable Read, Read Committed and Read Uncommitted and the three ways in which serializability might be violated: dirty read, nonrepeatable read and phantom read.
How a particular DBMS achieves each level of isolation is an implementation detail which differs between each DBMS. Some DBMS may use a locking strategy which means that read locks are used that means writers are blocked until a transaction completes. Other DBMS may use other strategies, such as multi-version concurrency control, which means readers and writers do not block each other. In order to maximize the performance and scalability of your application you will need to code to the particular implementation of the DBMS you are using.

Relationship between Isolation Level and Locks

I am trying to understand the relationship between Isolation Levels (http://msdn.microsoft.com/en-GB/library/ms173763.aspx) and locks (http://technet.microsoft.com/en-us/library/ms175519%28v=sql.105%29.aspx). I read a similar question on here and the answerer said
"Locks are implementation details. If you have a transaction that contains a SELECT:
SELECT * FROM SomeTable
You only have to worry about what data you will see from SomeTable"
However, if we do not have to worry about them then why do we have: Table Hints (http://msdn.microsoft.com/en-us/library/ms187373.aspx). Do we have to think about locks?
Well, locks are partly an implementation detail. They are used (in SQL Server) to give you the guarantees that the isolation levels give you according to the standard. For example, SERIALIZABLE gives you as-if single-threaded access to the database. This is done by S-locking all data you have read in order to freeze it.
Locks are not purely a detail because the very details of a locking scheme leak out. They are detectable. Also, you can control locks using hints to create behavior that is not specified in the standard. Usually, one would do that to obtain stronger guarantees than before.

Pessimistic vs. optimistic concurrency control

I have a question about pessimistic versus optimistic locking.
Everybody says that "Optimistic locking is used when you don't expect many collisions.", for example:
which concurrency control is more efficient pessimistic or optimistic concurrency control
Optimistic vs. Pessimistic locking
For a school project I need to find the 'break-even' point when pessimistic locking is more appropriate then optimistic locking.
Now, I would like to know/understand why such a break-even point exists? How it is possible that pessimistic locking is more costly (in speed or memory usage?) then optimistic locking?
I suspect it is because of the extra read-operation that pessimistic locking requires. But with optimistic locking this extra read-operation is also needed (only just before the save operation), right?
Hopefully someone can explain this :)
Thank you!
Pessimissm vs optimism in concurrency control is re transaction implementations interfering. (Notwithstanding any definitions expressed at your links or by specific products.)
The supposed pessimistic attitude is, someone will interfere so lock them out; the supposed optimistic attitude is, maybe no one will interfere so go ahead until completion then roll some process back if there was interference.
The costs are delays due to waiting by locked out processes vs delays due to re-computation by rolled back processes. We wish to optimize throughput given expected process properties and distribution.
(In your question you address only a given process rather than a collection and ignore a process having to wait or having to throw away work on rollback.)
EDIT
Think about what the words mean. Throughput involves work and time. A " 'break-even' point " presumes a dimension (interference) along which a quantity (throughput) differs between schemes (pessimistic/optimistic). You have to come up with a way to characterize and measure work and interference. You can see others' takes on reasonable interference to test from textbooks & their biblography references. Eg On Optimistic Methods for Concurrency Control.
Experimentally, calculate throughput for each scheme running your DBMS under varying amounts of interference.
The reality is that different interference workloads (> expected process properties and distribution) make the problem multidimensional. So you may want to calculate throughput as above for different interference scenarios.

when to prefer pessimistic model of transaction isolation over optimistic one?

Do I understand correctly that table/row lock hints are being used for pessimistic transaction (TX) isolation models of concurrency ONLY?
In other words, when can table/row lock hints be used during engagement of optimistic TX isolation provided by SQL Server (2005 and higher)?
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
I did read that pessimistic options are legacy and are not needed anymore, though I am in doubt.
Also, having optimistic (aka snapshot aka versioning) TX isolation levels built-in SQL Server2005+,
when one would need to manually code for optimistic concurrency features?
The last question is inspired by having read:
"Optimistic Concurrency in SQL Server" (September 28, 2007)
describing custom coding to provide versioning in SQL Server.
Optimistic concurrency requires more resources and is more expensive when the conflict occurs.
Two sessions can read and modify the values and the conflict only occurs when they try to apply their changes simultaneously. This means that in case of the concurrent update both values should be stored somewhere (which of course requires resources).
Also, when a conflict occurs, usually the whole transaction should be rolled back or the cursor refetched, which is expensive too.
Pessimistic concurrency model uses locking, thus downgrading concurrency but improving performance.
In case of two concurrent tasks, it may be cheaper for the second task to wait for a lock to release than spending CPU time and disk I/O on two simultaneous works and then yet more on rolling back the less fortunate work and redoing it.
Say, you have a query like this:
UPDATE mytable
SET myvalue = very_complex_function(#range)
WHERE rangeid = #range
, with very_complex_function reading some data from mytable itself. In other words, this query transforms a subset of mytable sharing the value of range.
Now, when two functions work on the same range, there may be two scenarios:
Pessimistic: the first query locks, the second query waits for it. The first query completes in 10 seconds, the second one does too. Total: 20 seconds.
Optimistic: both queries work independently (on the same input). This shares CPU time between them plus some overhead on switching. They should keep their intermediate data somewhere, so the data is stored twice (which implies twice I/O or memory). Let's say both complete almost at the same time, in 15seconds.
But when it's time to commit the work, the second query will conflict and will have to rollback its changes (say, it takes the same 15 seconds). Then it needs to reread the data again and do the work again, with the new set of data (10 seconds).
As a result, both queries complete later than with a pessimistic locking: 15 and 40 seconds vs. 10 and 20.
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
Optimistic isolation levels are, well, optimistic. You should not use them when you expect high contention on your data.
BTW, optimistic isolation (for the read queries) was available in SQL Server 2000 too.
I have a detailed answer here: Developing Modifications that Survive Concurrency
I think there's a bit confusion over terminology here.
The technique of optimistic locking/optimistic concurrency/... is a programming technique used to avoid the following scenario :
start transaction
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
display data on user's screen
await user input, lock remains active
keep awaiting user input, lock still preventing any writes/modifications
user input never comes (for whatever reason)
transaction times out (and this is usually not very rapidly, as the user must be given reasonable time to enter his input).
Optimistic locking replaces this with the following:
start transaction READ
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
end transaction READ, releasing the read lock just set
display data on user's screen
await user input, but data can be modified/deleted meanwhile by other transactions
user input arrives
start transaction WRITE
verify that the data has remained unaltered, raising an exception if it has
apply user updates
end transaction WRITE
So the single "user transaction" to go fetch some data, and change and update them, consists of two distinct "database transactions". What is usually called "isolation levels" applies to those database transactions. The "optimistic locking" that you refer to applies to the "user transaction".
The matter is further complicated in that, broadly speaking, two completely distinct strategies are possible for the "isolating the database transactions part" :
MVCC
2-phase locking
I think the "snapshot versioning isolation level" means that the MVCC technique (well, one of its various possible variations) is being used for the database transaction. The other commonly known isolation levels apply more to transaction isolation using 2PL as the serialization(/isolation) technique. (And mixing them up can get messy ...)

How do ACID and database transactions work?

What is the relationship between ACID and database transaction?
Does ACID give database transaction or is it the same thing?
Could someone enlighten this topic.
ACID is a set of properties that you would like to apply when modifying a database.
Atomicity
Consistency
Isolation
Durability
A transaction is a set of related changes which is used to achieve some of the ACID properties. Transactions are tools to achieve the ACID properties.
Atomicity means that you can guarantee that all of a transaction happens, or none of it does; you can do complex operations as one single unit, all or nothing, and a crash, power failure, error, or anything else won't allow you to be in a state in which only some of the related changes have happened.
Consistency means that you guarantee that your data will be consistent; none of the constraints you have on related data will ever be violated.
Isolation means that one transaction cannot read data from another transaction that is not yet completed. If two transactions are executing concurrently, each one will see the world as if they were executing sequentially, and if one needs to read data that is written by another, it will have to wait until the other is finished.
Durability means that once a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium (such as a hard disk), and the fact that the transaction has been completed is likewise recorded.
So, transactions are a mechanism for guaranteeing these properties; they are a way of grouping related actions together such that as a whole, a group of operations can be atomic, produce consistent results, be isolated from other operations, and be durably recorded.
ACID are desirable properties of any transaction processing engine.
A DBMS is (if it is any good) a particular kind of transaction processing engine that exposes, usually to a very large extent but not quite entirely, those properties.
But other engines exist that can also expose those properties. The kind of software that used to be called "TP monitors" being a case in point (nowadays' equivalent mostly being web servers).
Such TP monitors can access resources other than a DBMS (e.g. a printer), and still guarantee ACID toward their users. As an example of what ACID might mean when a printer is involved in a transaction:
Atomicity: an entire document gets printed or nothing at all
Consistency: at end-of-transaction, the paper feed is positioned at top-of-page
Isolation: no two documents get mixed up while printing
Durability: the printer can guarantee that it was not "printing" with empty cartridges.
What is the relationship between ACID and database transaction?
In a relational database, every SQL statement must execute in the scope of a transaction.
Without defining the transaction boundaries explicitly, the database is going to use an implicit transaction which is wraps around every individual statement.
The implicit transaction begins before the statement is executed and end (commit or rollback) after the statement is executed.
The implicit transaction mode is commonly known as auto-commit.
A transaction is a collection of read/write operations succeeding only if all contained operations succeed.
Inherently a transaction is characterized by four properties (commonly referred as ACID):
Atomicity
Consistency
Isolation
Durability
Does ACID give database transaction or is it the same thing?
For a relational database system, this is true because the SQL Standard specifies that a transaction should provide the ACID guarantees:
Atomicity
Atomicity takes individual operations and turns them into an all-or-nothing unit of work, succeeding if and only if all contained operations succeed.
A transaction might encapsulate a state change (unless it is a read-only one). A transaction must always leave the system in a consistent state, no matter how many concurrent transactions are interleaved at any given time.
Consistency
Consistency means that constraints are enforced for every committed transaction. That implies that all Keys, Data types, Checks and Trigger are successful and no constraint violation is triggered.
Isolation
Transactions require concurrency control mechanisms, and they guarantee correctness even when being interleaved. Isolation brings us the benefit of hiding uncommitted state changes from the outside world, as failing transactions shouldn’t ever corrupt the state of the system. Isolation is achieved through concurrency control using pessimistic or optimistic locking mechanisms.
Durability
A successful transaction must permanently change the state of a system, and before ending it, the state changes are recorded in a persisted transaction log. If our system is suddenly affected by a system crash or a power outage, then all unfinished committed transactions may be replayed.
I slightly modified the printer example to make it more explainable
1 document which had 2 pages content was sent to printer
Transaction - document sent to printer
atomicity - printer prints 2 pages of a document or none
consistency - printer prints half page and the page gets stuck. The printer restarts itself and prints 2 pages with all content
isolation - while there were too many print outs in progress - printer prints the right content of the document
durability - while printing, there was a power
cut- printer again prints documents without any errors
Hope this helps someone to get the hang of the concept of ACID
ACID properties are very old and important concept of database theory. I know that you can find lots of posts on this topic, but still I would like to start share answer on this because this is very important topic of RDBMS.
Database System plays with lots of different types of transactions where all transaction has certain characteristic. This characteristic is known ACID Properties.
ACID Properties take grantee for all database transactions to accomplish all tasks.
Atomicity : Either commit all or nothing.
Consistency : Make consistent record in terms of validate all rule and constraint of transaction.
Isolation : Make sure that two transaction is unaware to each other.
Durability : committed data stored forever.
Reference taken from this article:
To quote Wikipedia:
ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably.
A DBMS that supports transactions will strive to support all of these properties - any commercial DBMS (as well as several open-source DBMSs) provide full ACID 'support' - although it's often possible (for example, with varying isolation levels in MSSQL) to lessen the ACIDness - thus losing the guarantee of fully transactional behaviour.
ACID Properties in Databases:
Atomicity : Transactions are all or nothing
Consistency: Only valid data is saved (database from one state that is consistent to another state that is also consistent.)
Isolation: Transaction do not effect each other (Multiple transactions can run at the same time in the system. Executing multiple transactions in parallel must have the same results as running them sequentially.)
Durability: Written data will not be lost (even if the database crashes immediately or in the event of a power loss.)
Credit
[Gray] introduced the ACD properties for a transaction in 1981. In 1983 [Haerder] added the Isolation property. In my opinion, the ACD properties would be have a more useful set of properties to discuss. One interpretation of Atomicity (that the transaction should be atomic as seen from any client any time) would actually imply the isolation property. The "isolation" property is useful when the transaction is not isolated; when the isolation property is relaxed. In ANSI SQL speak: if the isolation level is weaker then SERIALIZABLE. But when the isolation level is SERIALIZABLE, the isolation property is not really of interest.
I have written more about this in a blog post: "ACID Does Not Make Sense".
http://blog.franslundberg.com/2013/12/acid-does-not-make-sense.html
[Gray] The Transaction Concept, Jim Gray, 1981.
http://research.microsoft.com/en-us/um/people/gray/papers/theTransactionConcept.pdf
[Haerder] Principles of Transaction-Oriented Database Recovery, Haerder and Reuter, 1983.
http://www.stanford.edu/class/cs340v/papers/recovery.pdf
Transaction can be defined as a collection of task that are considered as minimum processing unit. Each minimum processing unit can not be divided further.
All transaction must contain four properties that commonly known as ACID properties. i.e ACID are the group of properties of any transaction.
Atomicity :
Consistency
Isolation
Durability

Resources