Could someone give me some guidance on when I should use WITH (NOLOCK) as opposed to SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
What are the pros/cons of each? Are there any unintended consequences you've run into using one as opposed to the other?
They are the same thing. If you use the set transaction isolation level statement, it will apply to all the tables in the connection, so if you only want a nolock on one or two tables use that; otherwise use the other.
Both will give you dirty reads. If you are okay with that, then use them. If you can't have dirty reads, then consider snapshot or serializable hints instead.
WITH (NOLOCK) is a hint on a table level. Setting the transaction isolation level to READ_UNCOMMITTED with affect the connection. The difference is in terms of scope. See READUNCOMMITTED and NOLOCK in the SQL Server documentation here:
http://technet.microsoft.com/en-us/library/ms187373.aspx
For TRANSACTION ISOLATION LEVEL:
http://technet.microsoft.com/en-us/library/ms173763.aspx
NOLOCK is local to the table (or views etc)
READ UNCOMMITTED is per session/connection
As for guidelines... a random search from StackOverflow and the electric interweb...
Is the NOLOCK (Sql Server hint) bad practice?
When is it appropriate to use NOLOCK?
Get rid of those NOLOCK hints…
To my knowledge the only difference is the scope of the effects as Strommy said. NOLOCK hint on a table and the READ UNCOMMITTED on the session.
As to problems that can occur, it's all about consistency. If you care then be aware that you could get what is called dirty reads which could influence other data being manipulated on incorrect information.
I personally don't think I have seen any problems from this but that may be more due to how I use nolock. You need to be aware that there are scenarios where it will be OK to use. Scenarios where you are mostly adding new data to a table but have another process that comes in behind to check for a data scenario. That will probably be OK since the major flow doesn't include going back and updating rows during a read.
Also I believe that these days you should look into Multi-version Concurrency Control. I believe they added it in 2005 and it helps stop the writers from blocking readers by giving readers a snapshot of the database to use. I'll include a link and leave further research to the reader:
MVCC
Database Isolation Levels
You cannot use Set Transaction Isolation Level Read Uncommitted in a View (you can only have one script in there in fact), so you would have to use (nolock) if dirty rows should be included.
As you have to use WITH (NOLOCK) for each table it might be annoying to write it in every FROM or JOIN clause. However it has a reason why it is called a "dirty" read. So you really should know when you do one, and not set it as default for the session scope. Why?
Forgetting a WITH (NOLOCK) might not affect your program in a very dramatic way, however doing a dirty read where you do not want one can make the difference in certain circumstances.
So use WITH (NOLOCK) if the current data selected is allowed to be incorrect, as it might be rolled back later. This is mostly used when you want to increase performance, and the requirements on your application context allow it to take the risk that inconsistent data is being displayed. However you or someone in charge has to weigh up pros and cons of the decision of using WITH (NOLOCK).
Related
I guess the real question is:
If I don't care about dirty reads, will adding the with (NOLOCK) hint to a SELECT statement affect the performance of:
the current SELECT statement
other transactions against the given table
Example:
Select *
from aTable with (NOLOCK)
1) Yes, a select with NOLOCK will complete faster than a normal select.
2) Yes, a select with NOLOCK will allow other queries against the effected table to complete faster than a normal select.
Why would this be?
NOLOCK typically (depending on your DB engine) means give me your data, and I don't care what state it is in, and don't bother holding it still while you read from it. It is all at once faster, less resource-intensive, and very very dangerous.
You should be warned to never do an update from or perform anything system critical, or where absolute correctness is required using data that originated from a NOLOCK read. It is absolutely possible that this data contains rows that were deleted during the query's run or that have been deleted in other sessions that have yet to be finalized. It is possible that this data includes rows that have been partially updated. It is possible that this data contains records that violate foreign key constraints. It is possible that this data excludes rows that have been added to the table but have yet to be committed.
You really have no way to know what the state of the data is.
If you're trying to get things like a Row Count or other summary data where some margin of error is acceptable, then NOLOCK is a good way to boost performance for these queries and avoid having them negatively impact database performance.
Always use the NOLOCK hint with great caution and treat any data it returns suspiciously.
NOLOCK makes most SELECT statements faster, because of the lack of shared locks. Also, the lack of issuance of the locks means that writers will not be impeded by your SELECT.
NOLOCK is functionally equivalent to an isolation level of READ UNCOMMITTED. The main difference is that you can use NOLOCK on some tables but not others, if you choose. If you plan to use NOLOCK on all tables in a complex query, then using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED is easier, because you don't have to apply the hint to every table.
Here is information about all of the isolation levels at your disposal, as well as table hints.
SET TRANSACTION ISOLATION LEVEL
Table Hint (Transact-SQL)
In addition to what is said above, you should be very aware that nolock actually imposes the risk of you not getting rows that has been committed before your select.
See http://blogs.msdn.com/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx
It will be faster because it doesnt have to wait for locks
The answer is Yes if the query is run multiple times at once, because each transaction won't need to wait for the others to complete. However, If the query is run once on its own then the answer is No.
Yes. There's a significant probability that careful use of WITH(NOLOCK) will speed up your database overall. It means that other transactions won't have to wait for this SELECT statement to finish, but on the other hand, other transactions will slow down as they're now sharing their processing time with a new transaction.
Be careful to only use WITH (NOLOCK) in SELECT statements on tables that have a clustered index.
WITH(NOLOCK) is often exploited as a magic way to speed up database read transactions.
The result set can contain rows that have not yet been committed, that are often later rolled back.
If WITH(NOLOCK) is applied to a table that has a non-clustered index then row-indexes can be changed by other transactions as the row data is being streamed into the result-table. This means that the result-set can be missing rows or display the same row multiple times.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
Suppose I have a T-SQL statement like so:
BEGIN TRAN
UPDATE dbo.TableA
...
...
...
DELETE FROM dbo.TableB
COMMIT TRAN
Suppose that the update on TableA is going to take some time.
By default, would SQL Server lock TableB until the transaction is completed? Would that mean you can't read or write to it while the update is ongoing?
Short answer: NO and NO.
Long answer:
This is, in fact, a great question as it goes deep in transaction concepts and how the engine works but I guess a complete answer can occupy a good deal of a chapter on a good book and is out of the scope of this site.
First, keep in mind the engine can work in several isolation modes: snapshot, read committed, etc. I can recommend good research on this topic (this can take a few days).
Second, the engine has the granularity level and will try to use the "smallest" one but can escalate it on demand, depending on many factors, for example: "will this operation need a page split?"
Third, BEGIN, COMMIT, ROLLBACK work more in a "semaphore" way, flagging how changes are being phased from "memory" to "disk". It's a lot more complicated than it and that why I use quotes.
That said a "default transaction" will use a row granularity in a read committed isolation mode. Nothing says how locks will be issued one way or another.
It depends on stuff like foreign keys, triggers, how much of the table is being changed, etc.
TLDR: It depends on a lot of minor details particular to your scenario. The best way to find out is by testing.
Following the comments of #Jeroen Mostert, #marc_s, and #Cato under the question, your locks on TableA and TableB here are likely to escalate to table exclusive locks as there is no "where" clause. If so, the other read and write operations from different connections may be affected based on their transaction isolation level until the end of this transaction.
Besides, locks are created on-demand; it means that the query first puts a lock on the tableA and after the execution of the update operation, it puts another lock on the tableB.
In plain English, what are the disadvantages and advantages of using
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
in a query for .NET applications and reporting services applications?
This isolation level allows dirty reads. One transaction may see uncommitted changes made by some other transaction.
To maintain the highest level of isolation, a DBMS usually acquires locks on data, which may result in a loss of concurrency and a high locking overhead. This isolation level relaxes this property.
You may want to check out the Wikipedia article on READ UNCOMMITTED for a few examples and further reading.
You may also be interested in checking out Jeff Atwood's blog article on how he and his team tackled a deadlock issue in the early days of Stack Overflow. According to Jeff:
But is nolock dangerous? Could you end
up reading invalid data with read uncommitted on? Yes, in theory. You'll
find no shortage of database
architecture astronauts who start
dropping ACID science on you and all
but pull the building fire alarm when
you tell them you want to try nolock.
It's true: the theory is scary. But
here's what I think: "In theory there
is no difference between theory and
practice. In practice there is."
I would never recommend using nolock
as a general "good for what ails you"
snake oil fix for any database
deadlocking problems you may have. You
should try to diagnose the source of
the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems... As long as you know what you're doing.
One alternative to the READ UNCOMMITTED level that you may want to consider is the READ COMMITTED SNAPSHOT. Quoting Jeff again:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
My favorite use case for read uncommited is to debug something that is happening inside a transaction.
Start your software under a debugger, while you are stepping through the lines of code, it opens a transaction and modifies your database. While the code is stopped, you can open a query analyzer, set on the read uncommited isolation level and make queries to see what is going on.
You also can use it to see if long running procedures are stuck or correctly updating your database using a query with count(*).
It is great if your company loves to make overly complex stored procedures.
This can be useful to see the progress of long insert queries, make any rough estimates (like COUNT(*) or rough SUM(*)) etc.
In other words, the results the dirty read queries return are fine as long as you treat them as estimates and don't make any critical decisions based upon them.
The advantage is that it can be faster in some situations. The disadvantage is the result can be wrong (data which hasn't been committed yet could be returned) and there is no guarantee that the result is repeatable.
If you care about accuracy, don't use this.
More information is on MSDN:
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.
When is it ok to use READ UNCOMMITTED?
Rule of thumb
Good: Big aggregate reports showing constantly changing totals.
Risky: Nearly everything else.
The good news is that the majority of read-only reports fall in that Good category.
More detail...
Ok to use it:
Nearly all user-facing aggregate reports for current, non-static data e.g. Year to date sales.
It risks a margin of error (maybe < 0.1%) which is much lower than other uncertainty factors such as inputting error or just the randomness of when exactly data gets recorded minute to minute.
That covers probably the majority of what an Business Intelligence department would do in, say, SSRS. The exception of course, is anything with $ signs in front of it. Many people account for money with much more zeal than applied to the related core metrics required to service the customer and generate that money. (I blame accountants).
When risky
Any report that goes down to the detail level. If that detail is required it usually implies that every row will be relevant to a decision. In fact, if you can't pull a small subset without blocking it might be for the good reason that it's being currently edited.
Historical data. It rarely makes a practical difference but whereas users understand constantly changing data can't be perfect, they don't feel the same about static data. Dirty reads won't hurt here but double reads can occasionally be. Seeing as you shouldn't have blocks on static data anyway, why risk it?
Nearly anything that feeds an application which also has write capabilities.
When even the OK scenario is not OK.
Are any applications or update processes making use of big single transactions? Ones which remove then re-insert a lot of records you're reporting on? In that case you really can't use NOLOCK on those tables for anything.
Use READ_UNCOMMITTED in situation where source is highly unlikely to change.
When reading historical data. e.g some deployment logs that happened two days ago.
When reading metadata again. e.g. metadata based application.
Don't use READ_UNCOMMITTED when you know souce may change during fetch operation.
Regarding reporting, we use it on all of our reporting queries to prevent a query from bogging down databases. We can do that because we're pulling historical data, not up-to-the-microsecond data.
This will give you dirty reads, and show you transactions that's not committed yet. That is the most obvious answer. I don't think its a good idea to use this just to speed up your reads. There is other ways of doing that if you use a good database design.
Its also interesting to note whats not happening. READ UNCOMMITTED does not only ignore other table locks. It's also not causing any locks in its own.
Consider you are generating a large report, or you are migrating data out of your database using a large and possibly complex SELECT statement. This will cause a shared lock that's may be escalated to a shared table lock for the duration of your transaction. Other transactions may read from the table, but updates are impossible. This may be a bad idea if its a production database since the production may stop completely.
If you are using READ UNCOMMITTED you will not set a shared lock on the table. You may get the result from some new transactions or you may not depending where it the table the data were inserted and how long your SELECT transaction have read. You may also get the same data twice if for example a page split occurs (the data will be copied to another location in the data file).
So, if its very important for you that data can be inserted while doing your SELECT, READ UNCOMMITTED may make sense. You have to consider that your report may contain some errors, but if its based on millions of rows and only a few of them are updated while selecting the result this may be "good enough". Your transaction may also fail all together since the uniqueness of a row may not be guaranteed.
A better way altogether may be to use SNAPSHOT ISOLATION LEVEL but your applications may need some adjustments to use this. One example of this is if your application takes an exclusive lock on a row to prevent others from reading it and go into edit mode in the UI. SNAPSHOT ISOLATION LEVEL does also come with a considerable performance penalty (especially on disk). But you may overcome that by throwing hardware on the problem. :)
You may also consider restoring a backup of the database to use for reporting or loading data into a data warehouse.
It can be used for a simple table, for example in an insert-only audit table, where there is no update to existing row, and no fk to other table. The insert is a simple insert, which has no or little chance of rollback.
I always use READ UNCOMMITTED now. It's fast with the least issues. When using other isolations you will almost always come across some Blocking issues.
As long as you use Auto Increment fields and pay a little more attention to inserts then your fine, and you can say goodbye to blocking issues.
You can make errors with READ UNCOMMITED but to be honest, it is very easy make sure your inserts are full proof. Inserts/Updates which use the results from a select are only thing you need to watch out for. (Use READ COMMITTED here, or ensure that dirty reads aren't going to cause a problem)
So go the Dirty Reads (Specially for big reports), your software will run smoother...
How do you guys decide that you should be wrapping the sql in a transaction?
Please throw some light on this.
Cheers !!
A transaction should be used when you need a set of changes to be processed completely to consider the operation complete and valid. In other words, if only a portion executes successfully, will that result in incomplete or invalid data being stored in your database?
For example, if you have an insert followed by an update, what happens if the insert succeeds and the update fails? If that would result in incomplete data (in this case, an orphaned record), you should wrap the two statements in a transaction to get them to complete as a "set".
If you are executing two or more statements that you expect to be functionally atomic, you should wrap them in a transaction.
if your have more than a single data modifying statement to execute to complete a task, all should be within a transaction.
This way, if the first one is successful, but any of the following ones has an error, you can rollback (undo) everything as if nothing was ever done.
Whenever you wouldn't like it if part of the operation can complete and part of it doesn't.
Anytime you want to lock up your database and potentially crash your production application, anytime you want to litter your application with hidden scalability nightmares go ahead and create a transaction. Make it big, slow, and put a loop inside.
Seriously, none of the above answers acknowledge the trade-off and potential problems that come with heavy use of transactions. Be careful, and consider the risk/reward each time.
Ebay doesn't use them at all. I'm sure there are many others.
http://www.infoq.com/interviews/dan-pritchett-ebay-architecture
Whenever any operation falls under ACID(Atomicity,Consistency,Isolation,Durability) criteria you should use transactions
Read this article
When you want to use atomic or isolation property of database for a set of changes.
Atomicity: An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs(according to wikipedia).
Isolation: isolation determines how transaction integrity is visible to other users and systems(according to wikipedia).
One of the classical reasons we have a database deadlock is when two transactions are inserting and updating tables in a different order.
For example, transaction A inserts in Table A then Table B.
And transaction B inserts in Table B followed by A.
Such a scenario is always at risk of a database deadlock (assuming you are not using serializable isolation level).
My questions are:
What kind of patterns do you follow in your design to make sure that all transactions are inserting and updating in the same order.
A book I was reading- had a suggestion that you can sort the statements by the name of the table. Have you done something like this or different - which would enforce that all inserts and updates are in the same order?
What about deleting records? Delete needs to start from child tables and updates and inserts need to start from parent tables. How do you ensure that this would not run into a deadlock?
All transactions are
inserting\updating in the same order.
Deletes; identify records to be
deleted outside a transaction and
then attempt the deletion in the
smallest possible transaction, e.g.
looking up by the primary key or similar
identified during the lookup stage.
Small transactions generally.
Indexing and other performance
tuning to both speed transactions
and to promote index lookups over
tablescans.
Avoid 'Hot tables',
e.g. one table with incrementing
counters for other tables primary
keys. Any other 'switchboard' type
configuration is risky.
Especially if not using Oracle, learn
the looking behaviour of the target
RDBMS in detail (optimistic /
pessimistic, isolation levels, etc.)
Ensure you do not allow row locks to
escalate to table locks as some
RDMSes will.
Deadlocks are no biggie. Just be prepared to retry your transactions on failure.
And keep them short. Short transactions consisting of queries that touch very few records (via the magic of indexing) are ideal to minimize deadlocks - fewer rows are locked, and for a shorter period of time.
You need to know that modern database engines don't lock tables; they lock rows; so deadlocks are a bit less likely.
You can also avoid locking by using MVCC and the CONSISTENT READ transaction isolation level: instead of locking, some threads will just see stale data.
Carefully design your database processes to eliminate as much as possible transactions that involve multiple tables. When I've had database design control there has never been a case of deadlock for which I could not design out the condition that caused it. That's not to say they don't exist and perhaps abound in situations outside my limited experience; but I've had no shortage of opportunities to improve designs causing these kinds of problems. One obvious strategy is to start with a chronological write-only table for insertion of new complete atomic transactions with no interdependencies, and apply their effects in an orderly asynchronous process.
Always use the database default isolation levels and locking settings unless you are absolutely sure what risks they incur, and have proven it by testing. Redesign your process if at all possible first. Then, impose the least increase in protection required to eliminate the risk (and test to prove it.) Don't increase restrictiveness "just in case" - this often leads to unintended consequences, sometimes of the type you intended to avoid.
To repeat the point from another direction, most of what you will read on this and other sites advocating the alteration of database settings to deal with transaction risks and locking problems is misleading and/or false, as demonstrated by how they conflict with each other so regularly. Sadly, especially for SQL Server, I have found no source of documentation that isn't hopelessly confusing and inadequate.
I have found that one of the best investments I ever made in avoiding deadlocks was to use a Object Relational Mapper that could order database updates. The exact order is not important, as long as every transaction writes in the same order (and deletes in exactly the reverse order).
The reason that this avoids most deadlocks out of the box is that your operations are always table A first, then table B, then table C (which perhaps depends on table B).
You can achieve a similar result as long as you exercise care in your stored procedures or data layer's access code. The only problem is that it requires great care to do it by hand, whereas a ORM with a Unit of Work concept can automate most cases.
UPDATE: A delete should run forward to verify that everything is the version you expect (you still need record version numbers or timestamps) and then delete backwards once everything verifies. As this should all happen in one transaction, the possibility of something changing out from under you shouldn't exist. The only reason for the ORM doing it backwards is to obey the key requirements, but if you do your check forward, you will have all the locks you need already in hand.
I analyze all database actions to determine, for each one, if it needs to be in a multiple statement transaction, and then for each such case, what the minimum isolation level is required to prevent deadlocks... As you said serializable will certainly do so...
Generally, only a very few database actions require a multiple statement transaction in the first place, and of those, only a few require serializable isolation to eliminate deadlocks.
For those that do, set the isolation level for that transaction before you begin, and reset it whatever your default is after it commits.
Your example would only be a problem if the database locked the ENTIRE table. If your database is doing that...run :)