SQL Server transactions / concurrency confusion - must you always use table hints? - sql-server

When you create a SQL Server transaction, what gets locked if you do not specify a table hint? In order to lock something, must you always use a table hint? Can you lock rows/tables outside of transactions (i.e. in ordinary queries)? I understand the concept of locking and why you'd want to use it, I'm just not sure about how to implement it in SQL Server, any advice appreciated.

You should use query hints only as a last resort, and even then only after expert analysis. In some cases, they will cause a query to perform badly. So, unless you really know what you are doing, avoid using query hints.
Locking (of various types) happens automatically everytime you perform a query (unless NOLOCK is specified). The default Transaction Isolation level is READ COMMITTED
What are you actually trying to do?
Understanding Locking in SQL Server

"Can you lock rows/tables outside of transactions (i.e. in ordinary queries)?"
You'd better understand that there are no ordinary queries or actions in SQL Server, they are ALL, without any exceptions, transactional. This is how ACID-ness is achieved, see, for ex., [1]. If client tools or developer interactively do not specify transaction explicitly with BEGIN TRANSACTION and COMMIT/ROLLBACK, then implicit transactions are used.
Also, transaction is not synonym of locking/locks engagement. There is a plethora of mechanisms to control concurrency without locking (for example, versioning. etc.) as well as READ UNCOMMITTED transaction "isolation" (in this case, absence of any isolation) level does not control it at all.
Update2:
In order to lock something, must you always use a table hint?
As far as, transaction isolation level is not READ UNCOMMITTED or one of row-versioning (snapshot) isolation levels, for ex., default READ COMMITTED or set by, for ex.,
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
the locks are issued (I do not know where to start and how to end this topic[2]). Table hints, which can be used in statements, override these settings.
[1]
Paul S. Randal. Understanding Logging and Recovery in SQL Server
What is Logging?
http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx#id0060003
[2]
Insert trailing ) upon clicking this link
http://en.wikipedia.org/wiki/Isolation_(database_systems)
Update:
5 min ago I had reputation 784 (the same as 24h ago) and, now, without any visible downvotes, it dropped to 779.
Where can I ask this question if I am banned from meta.stackoverflow.com?

Related

How long will dirty reads stay visible?

While running the stored procedure CSP1 on AZURE SQL 2012 (Standard: S0) I also ran concurrently
SELECT TOP 10 *
FROM [affected_table] WITH (NOLOCK)
ORDER BY 1 DESC
Which returns past 10 records in the [affected_table] including those UNCOMMITED by CSP1. When the CSP1 fails and ROLLBACK, no records had been inserted into the [affected_table] yet when I re-run the same SELECT query with NOLOCK I can still see all those phantom records there.
Shouldn't these be cleared after the ROLLBACK is finished ?
Should I be clearing the cache?
Edit
After some construtive criticsm I fell I diverged from the real question.
1) The dirty reads stays "visible" while the transaction is not commited or rolled back. Note this can take an indefined time and can be hold (bloked) longer by other concurrent transactions.
2) Sure but issuing a rollback or a commit ill not necessary act imediatly. Your (update, insert, delete) transactions issued exclusive locks and the only thing ill prevent dirty reads is using a more restrictive isolation level.
3) You never mess with the cache in production! It's ok in development and even good in some kinds of perfomance tests.
Manage concurrence is an integral part of any DBMS and it's done automaticaly without human intervention (most of time)and in fact the engine is planned with that in mind.
Of course you can change the isolation level and even try to use hints.
If your requirements needs high concurrence and allows dirty reads be aware you cannot predict how much time they ill be held.
If your cannot allow dirty reads stick with the read commited and never use nolock hint.
NOLOCK
is a common Cargo Cult used in TSQL. It was almost mandatory in the past (SQL Server 7) but with the changes in isolation architecture (SQL Server 2000) it become almost obsolete. In fact its considered a bad practice since SQL Server 2005.
If you read the linked MSDN documentation you ill see NOLOCK is the old name for READUNCOMMITTED and it allows dirty reads.
Some people believe NOLOCK magically increases performance and cures deadlocks and balding.
IMHO you can try to get rid of all NOLOCKS.
Note: If you remove a nolock and your system develops a serious case of deadlocks it merely means NOLOCK is maskering another deeper and serious problems. There are scenarios where nolock is recommended but they are rare and needs the attention of a good DBA.

Costs of reading data in transaction on SQL Server

I am copying some data from sql server to firebird over the network. Because of integrity I need to use transactions, but I am transferring about 9k of rows. Has reading in opened transaction some negative influence on reading costs, against reading in non transaction mode?
"Non-Transaction mode" simply doesn't exists, there is always a transaction, whether you declare it or not. The question is really whether is there any difference between reading in an implicit transaction vs. reading in an explicit transaction. And the short answer is no, there is no difference.
There may be some difference if you use an explicit transaction under a higher isolation level, other than the default READ_COMMITTED. It also depends whether you do anything else in an explicit transaction, but all these details cannot be infered from the frugal information in your post.
The default transaction isolation level is READ COMMITTED. It will lock the table for others while querying it.
MSDN on transaction isolation level:
http://msdn.microsoft.com/en-us/library/ms173763.aspx
I've had the issue of an occasional strange error message concerning locking deadlock. This even happened to stackoverflow - see this great article by Jeff Atwood. I strongly recommend switching to 'read committed snapshot' which solved the error + my performance issues.
http://www.codinghorror.com/blog/2008/08/deadlocked.html

Lock Question - When is an Update (U) lock issued?

We are trying to resolve a deadlock problem. The transaction that is getting rolled back is attempting to issue an Update (U) lock on a resource that another transaction has an Exclusive (X) lock on. According to Books Online (http://msdn.microsoft.com/en-us/library/ms175519.aspx), an Update lock is supposed to prevent deadlocks, not cause them.
So, my question is, why/when is an Update lock applied to a resource? We're a little confused about this because the resource that is attempting to have the Update lock applied to will not be updated by the process that is having the transaction rolled back.
Thanks for your help on this.
Randy
You are going to need to do a bit more research to find out what is actually locking, what isolation level each query is in, etc.
Some helpful resources.
SQL Server Transaction Isolation Levels and their Locks
SQL Server Lock Types and Lock Hints
There's a whole universe of "what if" behind what causes deadlocks (by which I mean, there's no way to tell from your initial post what's really going on). Could be table locks, could be locks on indexes; could be oustanding transactions you are not aware of; could be table header locks, could be tempdb issues (very unlikely), who knows?
The best method I've ever found to diagnose deadlocks works like so:
Fire up SQL Profiler, configure it with the "Deadlock Graph" and "Lock: Deadlock" events, and be sure to include the TextData column
While Profiler is running, cause your application to generate the deadlock
Select the "Deadlock Graph" profiler row, and you'll get a simple yet confusing graphic display of what's going on. This might help you figure out what's really going on.
If that doesn't help, the graphic is generated by Profiler based on a very detailed lump of XML. Extract this string (select the Deadlock Graph row, Ctrl+C to copy, paste in your text editor of choice, and delete all the columns but the XML), and then review the XML (in your preferred XML editor).
To-date, once I've gotten and worked through that XML, I've always been able to figure out what was causing the deadlock. It's a good way to learn how weird and convoluted some of SQL's internals can get.
Probably you do not handle your transaction isolation properly and are in serialized transaction mode.
You should
Use the proper transaction isolation on the connection.
Sometimes use hints on SQL, like the NoLock hint when you basically just read some data and don't do anything else with it anyway in a transaction.

SQL Server 2000 Deadlock

We are experiencing some very annoying deadlock situations in a production SQL Server 2000 database.
The main setup is the following:
SQL Server 2000 Enterprise Edition.
Server is coded in C++ using ATL OLE Database.
All database objects are being accessed trough stored procedures.
All UPDATE/INSERT stored procedures wrap their internal operations in a BEGIN TRANS ... COMMIT TRANS block.
I collected some initial traces with SQL Profiler following several articles on the Internet like this one (ignore it is referring to SQL Server 2005 tools, the same principles apply). From the traces it appears to be a deadlock between two UPDATE queries.
We have taken some measures that may have reduced the likelihood of the problem from happening as:
SELECT WITH (NOLOCK). We have changed all the SELECT queries in the stored procedures to use WITH (NOLOCK). We understand the implications of having dirty reads but the data being queried is not that important since we do a lot of automatic refreshes and under normal conditions the UI will have the right values.
READ UNCOMMITTED. We have changed the transaction isolation level on the server code to be READ UNCOMMITED.
Reduced transaction scope. We have reduced the time a transaction is being held in order to minimize the probabilities of a database deadlock to take place.
We are also questioning the fact that we have a transaction inside the majority of the stored procedures (BEGIN TRANS ... COMMIT TRANS block). In this situation my guess is that the transaction isolation level is SERIALIZABLE, right? And what about if we also have a transaction isolation level specified in the source code that calls the stored procedure, which one applies?
This is a processing intensive application and we are hitting the database a lot for reads (bigger percentage) and some writes.
If this were a SQL Server 2005 database I could go with Geoff Dalgas answer on an deadlock issue concerning Stack Overflow, if that is even applicable for the issue I am running into. But upgrading to SQL Server 2005 is not, at the present time, a viable option.
As these initial attempts failed my question is: How would you go from here? What steps would you take to reduce or even avoid the deadlock from happening, or what commands/tools should I use to better expose the problem?
A few comments:
The isolation level explicitly specified in your stored procedure overrides isolatlation level of the caller.
If sp_getapplock is available on 2000, I'd use it:
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/06/30/855.aspx
In many cases serializable isolation level increases the chance you get a deadlock.
A good resource for 2000:
http://www.code-magazine.com/article.aspx?quickid=0309101&page=1
Also some of Bart Duncan's advice might be applicable:
http://blogs.msdn.com/bartd/archive/2006/09/09/747119.aspx
In addition to Alex's answer:
Eyeball the code to see if tables are being accessed in the same order. We did this recently and reordered code to alway to parent then child. The system had grown, code and features were more complex, more user: we simply started getting deadlocks.
- See if transactions can be shortened (eg start later, finish earlier, less processing)
Identify which code you'd like not to fail and use SET DEADLOCK PRIORITY LOW in the other
We've used this (SQL 2005 has more options here) to make sure that some code will never be deadlocked and sacrificed other code.
If you have SELECT at the start of the transaction to prepare some stuff, consider HOLDLOCK (maybe UPDLOCK) to keep this locked for the duration. We use this occasionally so stop writes on this table by other processes.
The reason for the deadlocks in my setup scenario was after all the indexes. We were using (generated by default) non clustered indexes for the primary keys of the tables. Changing to clustered indexes fixed the problem.
My guess would be that you are experiencing deadlocks, either:
Because your DML(Updates probably) statements are getting escalations to table-locks, or
Different stored procedures are accessing the same tables in transactions but in a different order.
To address this, I would first examine the stored procedures, and make sure the the modifications statements have the indexes that they need.
Note: this applies to both the target tables and the source tables (despite NOLOCK, an UPDATE's source tables will get locks also. Check the query plans for scans on user stored procedures. Unlike batch or bulk operations, most user queries & DMLs work on a small subsets of the table rows and so should not be locking the entire table.
Then secondly, I would check the stored procedures to ensure that all data access in a stored procedure is being done in a consistent order (Parent -> Child is usually preferred).

What is "with (nolock)" in SQL Server?

Can someone explain the implications of using with (nolock) on queries, when you should/shouldn't use it?
For example, if you have a banking application with high transaction rates and a lot of data in certain tables, in what types of queries would nolock be okay? Are there cases when you should always use it/never use it?
WITH (NOLOCK) is the equivalent of using READ UNCOMMITED as a transaction isolation level. So, you stand the risk of reading an uncommitted row that is subsequently rolled back, i.e. data that never made it into the database. So, while it can prevent reads being deadlocked by other operations, it comes with a risk. In a banking application with high transaction rates, it's probably not going to be the right solution to whatever problem you're trying to solve with it IMHO.
The question is what is worse:
a deadlock, or
a wrong value?
For financial databases, deadlocks are far worse than wrong values. I know that sounds backwards, but hear me out. The traditional example of DB transactions is you update two rows, subtracting from one and adding to another. That is wrong.
In a financial database you use business transactions. That means adding one row to each account. It is of utmost importance that these transactions complete and the rows are successfully written.
Getting the account balance temporarily wrong isn't a big deal, that is what the end of day reconciliation is for. And an overdraft from an account is far more likely to occur because two ATMs are being used at once than because of a uncommitted read from a database.
That said, SQL Server 2005 fixed most of the bugs that made NOLOCK necessary. So unless you are using SQL Server 2000 or earlier, you shouldn't need it.
Further Reading
Row-Level Versioning
Unfortunately it's not just about reading uncommitted data. In the background you may end up reading pages twice (in the case of a page split), or you may miss the pages altogether. So your results may be grossly skewed.
Check out Itzik Ben-Gan's article. Here's an excerpt:
" With the NOLOCK hint (or setting the
isolation level of the session to READ
UNCOMMITTED) you tell SQL Server that
you don't expect consistency, so there
are no guarantees. Bear in mind though
that "inconsistent data" does not only
mean that you might see uncommitted
changes that were later rolled back,
or data changes in an intermediate
state of the transaction. It also
means that in a simple query that
scans all table/index data SQL Server
may lose the scan position, or you
might end up getting the same row
twice. "
The text book example for legitimate usage of the nolock hint is report sampling against a high update OLTP database.
To take a topical example. If a large US high street bank wanted to run an hourly report looking for the first signs of a city level run on the bank, a nolock query could scan transaction tables summing cash deposits and cash withdrawals per city. For such a report the tiny percentage of error caused by rolled back update transactions would not reduce the value of the report.
Not sure why you are not wrapping financial transactions in database transactions (as when you transfer funds from one account to another - you don't commit one side of the transaction at-a-time - this is why explicit transactions exist). Even if your code is braindead to business transactions as it sounds like it is, all transactional databases have the potential to do implicit rollbacks in the event of errors or failure. I think this discussion is way over your head.
If you are having locking problems, implement versioning and clean up your code.
No lock not only returns wrong values it returns phantom records and duplicates.
It is a common misconception that it always makes queries run faster. If there are no write locks on a table, it does not make any difference. If there are locks on the table, it may make the query faster, but there is a reason locks were invented in the first place.
In fairness, here are two special scenarios where a nolock hint may provide utility
1) Pre-2005 sql server database that needs to run long query against live OLTP database this may be the only way
2) Poorly written application that locks records and returns control to the UI and readers are indefinitely blocked. Nolock can be helpful here if application cannot be fixed (third party etc) and database is either pre-2005 or versioning cannot be turned on.
NOLOCK is equivalent to READ UNCOMMITTED, however Microsoft says you should not use it for UPDATE or DELETE statements:
For UPDATE or DELETE statements: This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
http://msdn.microsoft.com/en-us/library/ms187373.aspx
This article applies to SQL Server 2005, so the support for NOLOCK exists if you are using that version. In order to future-proof you code (assuming you've decided to use dirty reads) you could use this in your stored procedures:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
You can use it when you're only reading data, and you don't really care about whether or not you might be getting back data that is not committed yet.
It can be faster on a read operation, but I cannot really say by how much.
In general, I recommend against using it - reading uncommitted data can be a bit confusing at best.
Another case where it's usually okay is in a reporting database, where data is perhaps already aged and writes just don't happen. In this case, though, the option should be set at the database or table level by the administrator by changing the default isolation level.
In the general case: you can use it when you are very sure that it's okay to read old data. The important thing to remember is that its very easy to get that wrong. For example, even if it's okay at the time you write the query, are you sure something won't change in the database in the future to make these updates more important?
I'll also 2nd the notion that it's probably not a good idea in banking app. Or inventory app. Or anywhere you're thinking about transactions.
Simple answer - whenever your SQL is not altering data, and you have a query that might interfere with other activity (via locking).
It's worth considering for any queries used for reports, especially if the query takes more than, say, 1 second.
It's especially useful if you have OLAP-type reports you're running against an OLTP database.
The first question to ask, though, is "why am I worrying about this?" ln my experience, fudging the default locking behavior often takes place when someone is in "try anything" mode and this is one case where unexpected consequences are not unlikely. Too often it's a case of premature optimization and can too easily get left embedded in an application "just in case." It's important to understand why you're doing it, what problem it solves, and whether you actually have the problem.
Short answer:
Only use WITH (NOLOCK) in SELECT statement on tables that have a clustered index.
Long answer:
WITH(NOLOCK) is often exploited as a magic way to speed up database reads.
The result set can contain rows that have not yet been committed, that are often later rolled back.
If WITH(NOLOCK) is applied to a table that has a non-clustered index then row-indexes can be changed by other transactions as the row data is being streamed into the result-table. This means that the result-set can be missing rows or display the same row multiple times.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
My 2 cents - it makes sense to use WITH (NOLOCK) when you need to generate reports. At this point, the data wouldn't change much & you wouldn't want to lock those records.
If you are handling finance transactions then you will never want to use nolock. nolock is best used to select from large tables that have lots updates and you don't care if the record you get could possibly be out of date.
For financial records (and almost all other records in most applications) nolock would wreak havoc as you could potentially read data back from a record that was being written to and not get the correct data.
I've used to retrieve a "next batch" for things to do. It doesn't matter in this case which exact item, and I have a lot of users running this same query.
Use nolock when you are okay with the "dirty" data. Which means nolock can also read data which is in the process of being modified and/or uncommitted data.
It's generally not a good idea to use it in high transaction environment and that is why it is not a default option on query.
I use with (nolock) hint particularly in SQLServer 2000 databases with high activity. I am not certain that it is needed in SQL Server 2005 however. I recently added that hint in a SQL Server 2000 at the request of the client's DBA, because he was noticing a lot of SPID record locks.
All I can say is that using the hint has NOT hurt us and appears to have made the locking problem solve itself. The DBA at that particular client basically insisted that we use the hint.
By the way, the databases I deal with are back-ends to enterprise medical claims systems, so we are talking about millions of records and 20+ tables in many joins. I typically add a WITH (nolock) hint for each table in the join (unless it is a derived table, in which case you can't use that particular hint)
The simplest answer is a simple question - do you need your results to be repeatable? If yes then NOLOCKS is not appropriate under any circumstances
If you don't need repeatability then nolocks may be useful, especially if you don't have control over all processes connecting to the target database.

Resources