Relationship between Isolation Level and Locks - sql-server

I am trying to understand the relationship between Isolation Levels (http://msdn.microsoft.com/en-GB/library/ms173763.aspx) and locks (http://technet.microsoft.com/en-us/library/ms175519%28v=sql.105%29.aspx). I read a similar question on here and the answerer said
"Locks are implementation details. If you have a transaction that contains a SELECT:
SELECT * FROM SomeTable
You only have to worry about what data you will see from SomeTable"
However, if we do not have to worry about them then why do we have: Table Hints (http://msdn.microsoft.com/en-us/library/ms187373.aspx). Do we have to think about locks?

Well, locks are partly an implementation detail. They are used (in SQL Server) to give you the guarantees that the isolation levels give you according to the standard. For example, SERIALIZABLE gives you as-if single-threaded access to the database. This is done by S-locking all data you have read in order to freeze it.
Locks are not purely a detail because the very details of a locking scheme leak out. They are detectable. Also, you can control locks using hints to create behavior that is not specified in the standard. Usually, one would do that to obtain stronger guarantees than before.

Related

How to use SET TRANSACTION ISOLATION LEVEL? [duplicate]

Could someone give me some guidance on when I should use WITH (NOLOCK) as opposed to SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
What are the pros/cons of each? Are there any unintended consequences you've run into using one as opposed to the other?
They are the same thing. If you use the set transaction isolation level statement, it will apply to all the tables in the connection, so if you only want a nolock on one or two tables use that; otherwise use the other.
Both will give you dirty reads. If you are okay with that, then use them. If you can't have dirty reads, then consider snapshot or serializable hints instead.
WITH (NOLOCK) is a hint on a table level. Setting the transaction isolation level to READ_UNCOMMITTED with affect the connection. The difference is in terms of scope. See READUNCOMMITTED and NOLOCK in the SQL Server documentation here:
http://technet.microsoft.com/en-us/library/ms187373.aspx
For TRANSACTION ISOLATION LEVEL:
http://technet.microsoft.com/en-us/library/ms173763.aspx
NOLOCK is local to the table (or views etc)
READ UNCOMMITTED is per session/connection
As for guidelines... a random search from StackOverflow and the electric interweb...
Is the NOLOCK (Sql Server hint) bad practice?
When is it appropriate to use NOLOCK?
Get rid of those NOLOCK hints…
To my knowledge the only difference is the scope of the effects as Strommy said. NOLOCK hint on a table and the READ UNCOMMITTED on the session.
As to problems that can occur, it's all about consistency. If you care then be aware that you could get what is called dirty reads which could influence other data being manipulated on incorrect information.
I personally don't think I have seen any problems from this but that may be more due to how I use nolock. You need to be aware that there are scenarios where it will be OK to use. Scenarios where you are mostly adding new data to a table but have another process that comes in behind to check for a data scenario. That will probably be OK since the major flow doesn't include going back and updating rows during a read.
Also I believe that these days you should look into Multi-version Concurrency Control. I believe they added it in 2005 and it helps stop the writers from blocking readers by giving readers a snapshot of the database to use. I'll include a link and leave further research to the reader:
MVCC
Database Isolation Levels
You cannot use Set Transaction Isolation Level Read Uncommitted in a View (you can only have one script in there in fact), so you would have to use (nolock) if dirty rows should be included.
As you have to use WITH (NOLOCK) for each table it might be annoying to write it in every FROM or JOIN clause. However it has a reason why it is called a "dirty" read. So you really should know when you do one, and not set it as default for the session scope. Why?
Forgetting a WITH (NOLOCK) might not affect your program in a very dramatic way, however doing a dirty read where you do not want one can make the difference in certain circumstances.
So use WITH (NOLOCK) if the current data selected is allowed to be incorrect, as it might be rolled back later. This is mostly used when you want to increase performance, and the requirements on your application context allow it to take the risk that inconsistent data is being displayed. However you or someone in charge has to weigh up pros and cons of the decision of using WITH (NOLOCK).

Definition relationship between Read Consistency in Oracle and Consistency in database ACID

Definition for Consistency:
In database systems, a consistent transaction is one that starts with a database in a consistent state and ends with the database in a consistent state. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof.
Definition for Read Consistency:
Oracle always enforces statement-level read consistency. This guarantees that all the data returned by a single query comes from a single point in time—the time that the query began. Therefore, a query never sees dirty data or any of the changes made by transactions that commit during query execution.
I've got puzzled, it seems that Read Consistency is kind of Isolation but not Consistency. Is that true?
That's right, but the concepts are related and the verbage maybe mixed up a bit.
"Consistency" (as in ACID) means that when you update the database, you cannot put it into an inconsistent state (the database will enforce that all constraints are met).
"Read Consistency" is one of the "Transaction Isolation" levels that describe how good concurrent transactions are isolated from each-other (i.e. in how far they can treat the database as if only they work on it).
Okay, let me rephrase that. The four transaction isolation levels describe what happens when you run multiple queries in the same transaction. So that is a potentially much stronger consistency.
The "Read Consistency" you mentioned is about a single query returning results that are "internally consistent": They represent data that existed at one point in time (when the query started). You won't get weird results for long-running queries.
I think this is a bit stronger of a guarantee than READ COMMITTED (you don't see anything that is not yet committed -- no dirty reads -- but you also don't see anything that has been committed after your query started).
But it does not imply that you get the same results when you run the same query in the same transaction five minutes later: you might see data that has been updated (and committed!) in the mean-time. If you don't want that, you need REPEATABLE READ.
As far as I know, READ UNCOMMITTED is not available in Oracle. That is probably what they mean by "Oracle always enforces statement-level read consistency".

Integrity and Confidentiality in Distributed Transactions

I've a question regarding distributed transactions. Let's assume I have 3 transaction programs:
Transaction A
begin
a=read(A)
b=read(B)
c=a+b
write(C,c)
commit
Transaction B
begin
a=read(A)
a=a+1
write(A,a)
commit
Transaction C
begin
c=read(C)
c=c*2
write(A,c)
commit
So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4.
I should ensure integrity and confidentiality, do you have any idea of how to achieve it?
Thank you in advance!
What you have described in your post is a common situation in multi-user systems. Different sessions simultaneously start transactions using the same tables and indeed the same rows. There are two issues here:
What happens if Session C reads a record after Session A has updated it but before Session A has committed its trandsaction?
What happens if Session C updates the same record which Session A has updated but not committed?
(Your scenario only illustrates the first of these issues).
The answer to the first question is ioslation level. This is the definition of the visibility of uncommmitted transactions across sessions. The ANSI standard specifies four levels:
SERIALIZABLE: no changes from another session are ever visible.
REPEATABLE READ: phantom reads allowed, that is the same query executed twice may return different results.
READ COMMITTED: only changes which have been committed by another session are visible.
READ UNCOMMITTED: diryt readsallowed, that is uncommitted changes from one session are visible in another.
Different flavours or database implement these in different fashions, and not all databases support all of them. For instance, Oracle only supports READ COMMITTED and SERIALIZABLE, and it implements SERIALIZABLE as a snapsot (i.e. it is a read-only transaction). However, it uses multiversion concurrency control to prevent non-repeatable reads in READ COMMITTED transactions.
So, coming back to your question, the answer is: set the appropriate Isolation Level. What the appropriate level is depends on what levels your database supports, and what behaviour you wish to happen. Probably you want READ COMMITTED or SERIALIZABLE, that is you want your transactions to proceed on the basis of data values being consistent with the start of the transaction.
As to the other matter, the answer is simpler: transactions must issue locks on tables or preferably just the required rows, before they start to update them. This ensures that the transaction can proceed to change those values without causing a deadlock. This is called pessimistic locking. It is not possible in applications which use connection pooling (i.e. most web-based applications), and the situation there is much gnarlier.

Why use a READ UNCOMMITTED isolation level?

In plain English, what are the disadvantages and advantages of using
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
in a query for .NET applications and reporting services applications?
This isolation level allows dirty reads. One transaction may see uncommitted changes made by some other transaction.
To maintain the highest level of isolation, a DBMS usually acquires locks on data, which may result in a loss of concurrency and a high locking overhead. This isolation level relaxes this property.
You may want to check out the Wikipedia article on READ UNCOMMITTED for a few examples and further reading.
You may also be interested in checking out Jeff Atwood's blog article on how he and his team tackled a deadlock issue in the early days of Stack Overflow. According to Jeff:
But is nolock dangerous? Could you end
up reading invalid data with read uncommitted on? Yes, in theory. You'll
find no shortage of database
architecture astronauts who start
dropping ACID science on you and all
but pull the building fire alarm when
you tell them you want to try nolock.
It's true: the theory is scary. But
here's what I think: "In theory there
is no difference between theory and
practice. In practice there is."
I would never recommend using nolock
as a general "good for what ails you"
snake oil fix for any database
deadlocking problems you may have. You
should try to diagnose the source of
the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems... As long as you know what you're doing.
One alternative to the READ UNCOMMITTED level that you may want to consider is the READ COMMITTED SNAPSHOT. Quoting Jeff again:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
My favorite use case for read uncommited is to debug something that is happening inside a transaction.
Start your software under a debugger, while you are stepping through the lines of code, it opens a transaction and modifies your database. While the code is stopped, you can open a query analyzer, set on the read uncommited isolation level and make queries to see what is going on.
You also can use it to see if long running procedures are stuck or correctly updating your database using a query with count(*).
It is great if your company loves to make overly complex stored procedures.
This can be useful to see the progress of long insert queries, make any rough estimates (like COUNT(*) or rough SUM(*)) etc.
In other words, the results the dirty read queries return are fine as long as you treat them as estimates and don't make any critical decisions based upon them.
The advantage is that it can be faster in some situations. The disadvantage is the result can be wrong (data which hasn't been committed yet could be returned) and there is no guarantee that the result is repeatable.
If you care about accuracy, don't use this.
More information is on MSDN:
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.
When is it ok to use READ UNCOMMITTED?
Rule of thumb
Good: Big aggregate reports showing constantly changing totals.
Risky: Nearly everything else.
The good news is that the majority of read-only reports fall in that Good category.
More detail...
Ok to use it:
Nearly all user-facing aggregate reports for current, non-static data e.g. Year to date sales.
It risks a margin of error (maybe < 0.1%) which is much lower than other uncertainty factors such as inputting error or just the randomness of when exactly data gets recorded minute to minute.
That covers probably the majority of what an Business Intelligence department would do in, say, SSRS. The exception of course, is anything with $ signs in front of it. Many people account for money with much more zeal than applied to the related core metrics required to service the customer and generate that money. (I blame accountants).
When risky
Any report that goes down to the detail level. If that detail is required it usually implies that every row will be relevant to a decision. In fact, if you can't pull a small subset without blocking it might be for the good reason that it's being currently edited.
Historical data. It rarely makes a practical difference but whereas users understand constantly changing data can't be perfect, they don't feel the same about static data. Dirty reads won't hurt here but double reads can occasionally be. Seeing as you shouldn't have blocks on static data anyway, why risk it?
Nearly anything that feeds an application which also has write capabilities.
When even the OK scenario is not OK.
Are any applications or update processes making use of big single transactions? Ones which remove then re-insert a lot of records you're reporting on? In that case you really can't use NOLOCK on those tables for anything.
Use READ_UNCOMMITTED in situation where source is highly unlikely to change.
When reading historical data. e.g some deployment logs that happened two days ago.
When reading metadata again. e.g. metadata based application.
Don't use READ_UNCOMMITTED when you know souce may change during fetch operation.
Regarding reporting, we use it on all of our reporting queries to prevent a query from bogging down databases. We can do that because we're pulling historical data, not up-to-the-microsecond data.
This will give you dirty reads, and show you transactions that's not committed yet. That is the most obvious answer. I don't think its a good idea to use this just to speed up your reads. There is other ways of doing that if you use a good database design.
Its also interesting to note whats not happening. READ UNCOMMITTED does not only ignore other table locks. It's also not causing any locks in its own.
Consider you are generating a large report, or you are migrating data out of your database using a large and possibly complex SELECT statement. This will cause a shared lock that's may be escalated to a shared table lock for the duration of your transaction. Other transactions may read from the table, but updates are impossible. This may be a bad idea if its a production database since the production may stop completely.
If you are using READ UNCOMMITTED you will not set a shared lock on the table. You may get the result from some new transactions or you may not depending where it the table the data were inserted and how long your SELECT transaction have read. You may also get the same data twice if for example a page split occurs (the data will be copied to another location in the data file).
So, if its very important for you that data can be inserted while doing your SELECT, READ UNCOMMITTED may make sense. You have to consider that your report may contain some errors, but if its based on millions of rows and only a few of them are updated while selecting the result this may be "good enough". Your transaction may also fail all together since the uniqueness of a row may not be guaranteed.
A better way altogether may be to use SNAPSHOT ISOLATION LEVEL but your applications may need some adjustments to use this. One example of this is if your application takes an exclusive lock on a row to prevent others from reading it and go into edit mode in the UI. SNAPSHOT ISOLATION LEVEL does also come with a considerable performance penalty (especially on disk). But you may overcome that by throwing hardware on the problem. :)
You may also consider restoring a backup of the database to use for reporting or loading data into a data warehouse.
It can be used for a simple table, for example in an insert-only audit table, where there is no update to existing row, and no fk to other table. The insert is a simple insert, which has no or little chance of rollback.
I always use READ UNCOMMITTED now. It's fast with the least issues. When using other isolations you will almost always come across some Blocking issues.
As long as you use Auto Increment fields and pay a little more attention to inserts then your fine, and you can say goodbye to blocking issues.
You can make errors with READ UNCOMMITED but to be honest, it is very easy make sure your inserts are full proof. Inserts/Updates which use the results from a select are only thing you need to watch out for. (Use READ COMMITTED here, or ensure that dirty reads aren't going to cause a problem)
So go the Dirty Reads (Specially for big reports), your software will run smoother...

Database deadlocks

One of the classical reasons we have a database deadlock is when two transactions are inserting and updating tables in a different order.
For example, transaction A inserts in Table A then Table B.
And transaction B inserts in Table B followed by A.
Such a scenario is always at risk of a database deadlock (assuming you are not using serializable isolation level).
My questions are:
What kind of patterns do you follow in your design to make sure that all transactions are inserting and updating in the same order.
A book I was reading- had a suggestion that you can sort the statements by the name of the table. Have you done something like this or different - which would enforce that all inserts and updates are in the same order?
What about deleting records? Delete needs to start from child tables and updates and inserts need to start from parent tables. How do you ensure that this would not run into a deadlock?
All transactions are
inserting\updating in the same order.
Deletes; identify records to be
deleted outside a transaction and
then attempt the deletion in the
smallest possible transaction, e.g.
looking up by the primary key or similar
identified during the lookup stage.
Small transactions generally.
Indexing and other performance
tuning to both speed transactions
and to promote index lookups over
tablescans.
Avoid 'Hot tables',
e.g. one table with incrementing
counters for other tables primary
keys. Any other 'switchboard' type
configuration is risky.
Especially if not using Oracle, learn
the looking behaviour of the target
RDBMS in detail (optimistic /
pessimistic, isolation levels, etc.)
Ensure you do not allow row locks to
escalate to table locks as some
RDMSes will.
Deadlocks are no biggie. Just be prepared to retry your transactions on failure.
And keep them short. Short transactions consisting of queries that touch very few records (via the magic of indexing) are ideal to minimize deadlocks - fewer rows are locked, and for a shorter period of time.
You need to know that modern database engines don't lock tables; they lock rows; so deadlocks are a bit less likely.
You can also avoid locking by using MVCC and the CONSISTENT READ transaction isolation level: instead of locking, some threads will just see stale data.
Carefully design your database processes to eliminate as much as possible transactions that involve multiple tables. When I've had database design control there has never been a case of deadlock for which I could not design out the condition that caused it. That's not to say they don't exist and perhaps abound in situations outside my limited experience; but I've had no shortage of opportunities to improve designs causing these kinds of problems. One obvious strategy is to start with a chronological write-only table for insertion of new complete atomic transactions with no interdependencies, and apply their effects in an orderly asynchronous process.
Always use the database default isolation levels and locking settings unless you are absolutely sure what risks they incur, and have proven it by testing. Redesign your process if at all possible first. Then, impose the least increase in protection required to eliminate the risk (and test to prove it.) Don't increase restrictiveness "just in case" - this often leads to unintended consequences, sometimes of the type you intended to avoid.
To repeat the point from another direction, most of what you will read on this and other sites advocating the alteration of database settings to deal with transaction risks and locking problems is misleading and/or false, as demonstrated by how they conflict with each other so regularly. Sadly, especially for SQL Server, I have found no source of documentation that isn't hopelessly confusing and inadequate.
I have found that one of the best investments I ever made in avoiding deadlocks was to use a Object Relational Mapper that could order database updates. The exact order is not important, as long as every transaction writes in the same order (and deletes in exactly the reverse order).
The reason that this avoids most deadlocks out of the box is that your operations are always table A first, then table B, then table C (which perhaps depends on table B).
You can achieve a similar result as long as you exercise care in your stored procedures or data layer's access code. The only problem is that it requires great care to do it by hand, whereas a ORM with a Unit of Work concept can automate most cases.
UPDATE: A delete should run forward to verify that everything is the version you expect (you still need record version numbers or timestamps) and then delete backwards once everything verifies. As this should all happen in one transaction, the possibility of something changing out from under you shouldn't exist. The only reason for the ORM doing it backwards is to obey the key requirements, but if you do your check forward, you will have all the locks you need already in hand.
I analyze all database actions to determine, for each one, if it needs to be in a multiple statement transaction, and then for each such case, what the minimum isolation level is required to prevent deadlocks... As you said serializable will certainly do so...
Generally, only a very few database actions require a multiple statement transaction in the first place, and of those, only a few require serializable isolation to eliminate deadlocks.
For those that do, set the isolation level for that transaction before you begin, and reset it whatever your default is after it commits.
Your example would only be a problem if the database locked the ENTIRE table. If your database is doing that...run :)

Resources