Reading query within a database transaction - database

I was wondering what is the point in performing a read from a database table within a database transaction? In my case, it is one single select statement on 1 table. This table could be updated by other threads. How does a database transaction help me in this situation? I am having trouble finding information on this specific case.

The transaction can help you define the level of separation between your query and concurrent updates.You define how concurrent updates can influence the data you are getting as result. It may be important to ensure data consistency in the result. This site lists the common isolation modes.
none: No transaction isolation.
read-committed: Dirty reads are prevented; non-repeatable reads and phantom reads can occur.
read-uncommitted: Dirty reads, non-repeatable reads and phantom reads can occur.
repeatable-read: Dirty reads and non-repeatable reads are prevented; phantom reads can occur.
serializable: Dirty reads, non-repeatable reads, and phantom reads are prevented.
EDIT: A non-repeatable read means that the data you will get from a query takes newly commited data into account, so the result can be different if the query executed multiple times. A dirty read is similar, but you also can read uncommited data from other transactions.
You should only care for isolation if you need consistent data at all times. It might sound strange not to care about consistency, but it is actually often traded for performance in real applications.
For example the 'x in stock' display you see in many web shops is often obtained without complete isolation. If two customers try to buy the last item in stock, one will get the last item and the other will have to wait until the item has been reordered by the shop.
How you determine and define the used isolation depends on the language, database and frameworks you are using. In Hibernate, for example, you can set the isolation property on the Connection:
connection.setTransactionIsolation(Connection.READ_UNCOMMITTED);

Related

Definition relationship between Read Consistency in Oracle and Consistency in database ACID

Definition for Consistency:
In database systems, a consistent transaction is one that starts with a database in a consistent state and ends with the database in a consistent state. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof.
Definition for Read Consistency:
Oracle always enforces statement-level read consistency. This guarantees that all the data returned by a single query comes from a single point in time—the time that the query began. Therefore, a query never sees dirty data or any of the changes made by transactions that commit during query execution.
I've got puzzled, it seems that Read Consistency is kind of Isolation but not Consistency. Is that true?
That's right, but the concepts are related and the verbage maybe mixed up a bit.
"Consistency" (as in ACID) means that when you update the database, you cannot put it into an inconsistent state (the database will enforce that all constraints are met).
"Read Consistency" is one of the "Transaction Isolation" levels that describe how good concurrent transactions are isolated from each-other (i.e. in how far they can treat the database as if only they work on it).
Okay, let me rephrase that. The four transaction isolation levels describe what happens when you run multiple queries in the same transaction. So that is a potentially much stronger consistency.
The "Read Consistency" you mentioned is about a single query returning results that are "internally consistent": They represent data that existed at one point in time (when the query started). You won't get weird results for long-running queries.
I think this is a bit stronger of a guarantee than READ COMMITTED (you don't see anything that is not yet committed -- no dirty reads -- but you also don't see anything that has been committed after your query started).
But it does not imply that you get the same results when you run the same query in the same transaction five minutes later: you might see data that has been updated (and committed!) in the mean-time. If you don't want that, you need REPEATABLE READ.
As far as I know, READ UNCOMMITTED is not available in Oracle. That is probably what they mean by "Oracle always enforces statement-level read consistency".

SQL server: how to read data from a table still in progress

I have two processes working simultaneously: one is generating a big table, and the other one is to read data from the table generated sequentially. I noticed that the second process has to wait for the first one finished to get started, i.e., the read operation blocks.
My question is that whether there is any way to allow the second process to read data from a table even if it is under working. Thanks!
Please, do not suggest using (nolock) for data that is not static. You may skip records or read records twice. Not good if it is financial data.
http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/
You should really take a look at my presentation 'How isolated are your sessions' - http://craftydba.com/?page_id=880.
What you are describing above is one of the isolation levels, read committed - writers block readers. It just a fact of life when dealing with transactions (sessions).
There are a bunch more isolations levels.
http://technet.microsoft.com/en-us/library/ms189122(v=sql.105).aspx
The (NOLOCK) or (READUNCOMMITTED) has three side effect, dirty reads, unrepeatable reads, and phantom reads.
How about using Read Committed Snapshot Isolation (*RCSI)?*
It is a version of Read Committed in which readers are not blocked since the version store (tempdb) keeps a copy of the records. It does not have as much of a impact as SNAPSHOT ISOLATION. Put in place some type of monitoring on version store growth.
http://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/
Like any advice at the transaction level, first test this change in a lower environment. Have a full understanding of the six isolation levels and how they effect your database & application.

when to prefer pessimistic model of transaction isolation over optimistic one?

Do I understand correctly that table/row lock hints are being used for pessimistic transaction (TX) isolation models of concurrency ONLY?
In other words, when can table/row lock hints be used during engagement of optimistic TX isolation provided by SQL Server (2005 and higher)?
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
I did read that pessimistic options are legacy and are not needed anymore, though I am in doubt.
Also, having optimistic (aka snapshot aka versioning) TX isolation levels built-in SQL Server2005+,
when one would need to manually code for optimistic concurrency features?
The last question is inspired by having read:
"Optimistic Concurrency in SQL Server" (September 28, 2007)
describing custom coding to provide versioning in SQL Server.
Optimistic concurrency requires more resources and is more expensive when the conflict occurs.
Two sessions can read and modify the values and the conflict only occurs when they try to apply their changes simultaneously. This means that in case of the concurrent update both values should be stored somewhere (which of course requires resources).
Also, when a conflict occurs, usually the whole transaction should be rolled back or the cursor refetched, which is expensive too.
Pessimistic concurrency model uses locking, thus downgrading concurrency but improving performance.
In case of two concurrent tasks, it may be cheaper for the second task to wait for a lock to release than spending CPU time and disk I/O on two simultaneous works and then yet more on rolling back the less fortunate work and redoing it.
Say, you have a query like this:
UPDATE mytable
SET myvalue = very_complex_function(#range)
WHERE rangeid = #range
, with very_complex_function reading some data from mytable itself. In other words, this query transforms a subset of mytable sharing the value of range.
Now, when two functions work on the same range, there may be two scenarios:
Pessimistic: the first query locks, the second query waits for it. The first query completes in 10 seconds, the second one does too. Total: 20 seconds.
Optimistic: both queries work independently (on the same input). This shares CPU time between them plus some overhead on switching. They should keep their intermediate data somewhere, so the data is stored twice (which implies twice I/O or memory). Let's say both complete almost at the same time, in 15seconds.
But when it's time to commit the work, the second query will conflict and will have to rollback its changes (say, it takes the same 15 seconds). Then it needs to reread the data again and do the work again, with the new set of data (10 seconds).
As a result, both queries complete later than with a pessimistic locking: 15 and 40 seconds vs. 10 and 20.
When one would need pessimistic TX isolation levels/hints in SQL Server2005+ if the later provides built-in optimistic (aka snapshot aka versioning) concurrency isolation?
Optimistic isolation levels are, well, optimistic. You should not use them when you expect high contention on your data.
BTW, optimistic isolation (for the read queries) was available in SQL Server 2000 too.
I have a detailed answer here: Developing Modifications that Survive Concurrency
I think there's a bit confusion over terminology here.
The technique of optimistic locking/optimistic concurrency/... is a programming technique used to avoid the following scenario :
start transaction
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
display data on user's screen
await user input, lock remains active
keep awaiting user input, lock still preventing any writes/modifications
user input never comes (for whatever reason)
transaction times out (and this is usually not very rapidly, as the user must be given reasonable time to enter his input).
Optimistic locking replaces this with the following:
start transaction READ
read data, setting a "read" lock on it to prevent any deletes/modifications to our data
end transaction READ, releasing the read lock just set
display data on user's screen
await user input, but data can be modified/deleted meanwhile by other transactions
user input arrives
start transaction WRITE
verify that the data has remained unaltered, raising an exception if it has
apply user updates
end transaction WRITE
So the single "user transaction" to go fetch some data, and change and update them, consists of two distinct "database transactions". What is usually called "isolation levels" applies to those database transactions. The "optimistic locking" that you refer to applies to the "user transaction".
The matter is further complicated in that, broadly speaking, two completely distinct strategies are possible for the "isolating the database transactions part" :
MVCC
2-phase locking
I think the "snapshot versioning isolation level" means that the MVCC technique (well, one of its various possible variations) is being used for the database transaction. The other commonly known isolation levels apply more to transaction isolation using 2PL as the serialization(/isolation) technique. (And mixing them up can get messy ...)

Dirty data in dbms and degree of isolation?

hi all i want to know that what actually dirty data means in dbms.
how can be calculate degree of isolation of a transaction(programm)in dbms.
What you probably mean is "dirty read". This is what you can get when using the transaction isolation level 'read uncommitted'. In the same Wikipedia page you will find more information about transaction isolation levels. But be aware that some databases support multi-version concurrency, where things are a bit different.
Dirty data refers to data that contains erroneous information. It may also be used when referring to data that is in memory and not yet loaded into a database. The complete removal of dirty data from a source is impractical or virtually impossible.Dirty data can be caused by a number of factors including duplicate records, incomplete or outdated data, and the improper parsing of record fields from disparate systems.
The following data can be considered as dirty data:
Misleading data
Duplicate data
Incorrect data
Inaccurate data
Non-integrated data
Data that violates business rules
Data without a generalized formatting
Incorrectly punctuated or spelled dataIn database systems, isolation determines how transaction integrity is visible to other users and systems.
The SQL standard defines four isolation levels :
Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may read not yet committed changes made by other transaction, thereby allowing dirty reads. In this level, transactions are not isolated from each other.
Read Committed – This isolation level guarantees that any data read is committed at the moment it is read. Thus it does not allows dirty read. The transaction holds a read or write lock on the current row, and thus prevent other transactions from reading, updating or deleting it.
Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows it references and writes locks on all rows it inserts, updates, or deletes. Since other transaction cannot read, update or delete these rows, consequently it avoids non-repeatable read.
Serializable – This is the Highest isolation level. A serializable execution is guaranteed to be serializable. Serializable execution is defined to be an execution of operations in which concurrently executing transactions appears to be serially executing.

Integrity and Confidentiality in Distributed Transactions

I've a question regarding distributed transactions. Let's assume I have 3 transaction programs:
Transaction A
begin
a=read(A)
b=read(B)
c=a+b
write(C,c)
commit
Transaction B
begin
a=read(A)
a=a+1
write(A,a)
commit
Transaction C
begin
c=read(C)
c=c*2
write(A,c)
commit
So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4.
I should ensure integrity and confidentiality, do you have any idea of how to achieve it?
Thank you in advance!
What you have described in your post is a common situation in multi-user systems. Different sessions simultaneously start transactions using the same tables and indeed the same rows. There are two issues here:
What happens if Session C reads a record after Session A has updated it but before Session A has committed its trandsaction?
What happens if Session C updates the same record which Session A has updated but not committed?
(Your scenario only illustrates the first of these issues).
The answer to the first question is ioslation level. This is the definition of the visibility of uncommmitted transactions across sessions. The ANSI standard specifies four levels:
SERIALIZABLE: no changes from another session are ever visible.
REPEATABLE READ: phantom reads allowed, that is the same query executed twice may return different results.
READ COMMITTED: only changes which have been committed by another session are visible.
READ UNCOMMITTED: diryt readsallowed, that is uncommitted changes from one session are visible in another.
Different flavours or database implement these in different fashions, and not all databases support all of them. For instance, Oracle only supports READ COMMITTED and SERIALIZABLE, and it implements SERIALIZABLE as a snapsot (i.e. it is a read-only transaction). However, it uses multiversion concurrency control to prevent non-repeatable reads in READ COMMITTED transactions.
So, coming back to your question, the answer is: set the appropriate Isolation Level. What the appropriate level is depends on what levels your database supports, and what behaviour you wish to happen. Probably you want READ COMMITTED or SERIALIZABLE, that is you want your transactions to proceed on the basis of data values being consistent with the start of the transaction.
As to the other matter, the answer is simpler: transactions must issue locks on tables or preferably just the required rows, before they start to update them. This ensures that the transaction can proceed to change those values without causing a deadlock. This is called pessimistic locking. It is not possible in applications which use connection pooling (i.e. most web-based applications), and the situation there is much gnarlier.

Resources