We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)
Related
In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.
I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.
In my SQL tempOrder table has millions of records and with 10 trigger to update tempOrder table with another table's update.
So I want to apply apply with(NOLOCK) on table.
I know with
SELECT * FROM temporder with(NOLOCK)
This statement I can do. But is there any way to apply with(NOLOCK) directly to the table from SQL Server 2008.
The direct answer to your question is NO -- there is no option to to tell SQL to never lock tableX. With that said, your question opens up a whole series of things that should be brought up.
Isolation Level
First, the most direct way you can accomplish what you want is to use with (nolock) option or SET TRANSACTION ISLOATION LEVEL READ UNCOMMITTED (aka chaos). These options are good for the query or the duration of the connection respectively. If I chose this route I would combine it with a long running SQL Profiler trace to identify any queries taking locks on TableX.
Lock Escalation
Second, SQL Server does have a table wide LOCK_ESCALATION threshold (executed as ALTER TABLE SET LOCK_ESCALATION x where X is the number of locks or AUTO). This controls when SQL attempts to consolidate many fine grained locks into fewer coarse grained locks. Said another way, it is a numeric threshold for converting how many locks are taken out on a single database object (think index).
Overriding SQL's lock escaltion generally isn't a good idea. As the documentation states:
In most cases, the Database Engine delivers the best performance when
operating with its default settings for locking and lock escalation.
As counter intuitive as it may seem, from the scenario you described you might have some luck with fewer broad locks instead of NOLOCK. You'll need to test this theory out with a real workload to determine if its worthwhile.
Snapshot Isolation
You might also check out the SNAPSHOT isolation level. There isn't enough information in your question to know, but I suspect it would help.
Dangers of NOLOCK
With that said, as you might have picked up from #GSerg's comment, NOLOCK can be evil. No-Lock is colloquially referred to as Chaos--and for good reason. When developers first encounter NOLOCK it seems like allowing dirty reads is the only implication. There are more...
dirty data is read for inconsistent results (the common impression)
wrong data -- meaning neither consistent with the pre-write or post-write state of your data.
Hard exceptions (like error 601 due to data movement) that terminate your query
Blank data is returned
previously committed rows are missed
Malformed bytes are returned
But don't take my word for it :
Actual Email: "NoLOCK is the epitome of evil?"
SQL Sever NOLOCK hint & other poor ideas
Is the nolock hint a bad practice
this is not a table's configuration.
If you add (nolock) to the query (it is called a query hint) you are saying that when executing this (and only this) query, it wont create lock on the affected tables.
Of course, you can make this configuration permanent for the current connection by setting a transaction isolation level to read uncommitted for example: set transaction isolation level read uncommitted. But again, it is valid only until that connection is open.
Perhaps if you explain in more details what you are trying to achieve, we can better help you.
You cannot change the default isolation level (except for snapshot) for a table or a database, however you can change it for all read queries in one transaction:
set transaction isolation level read uncommitted
See msdn for more information.
I have a stored procedure that performs a bulk insert in a table. I added BEGIN TRANSACTION command just above the INSERT query to enable ROLL BACK if something goes wrong. When the bulk insert initiated, it locked the entire table and other users were unable to execute SELECT on the same table.
I am not following why SQL Server locks entire table for even a SELECT.
I am using SQL Server 2005 Express. Is this a problem with this version or it persists in 2008 as well? How to overcome this situation? Writers should not block Readers.
Writers should not block Readers
This is true only for snapshot isolation, all other isolation levels require both readers to block writes and writers to block readers (dirty reads not considered, since they are inconsistent and should never be used). If you need this behavior, then use row versioning (the link contains the solution).
Why does bulk insert lock the entire table?
This actually may or may not be true. The behavior is under your control:
TABLOCK
Specifies that a table-level lock is acquired for the duration of
the bulk-import operation. A table can be loaded concurrently by
multiple clients if the table has no indexes and TABLOCK is specified.
By default, locking behavior is determined by the table option table
lock on bulk load.
For more details, read the product specifications: Controlling Locking Behavior for Bulk Import.
You have an open transaction. That means SQL Server needs to preserve the state of the table, and any changes you are in the process of making are "dirty" and uncommitted.
If you SELECT from a table that is currently being altered with an open (explicit) transaction, the SELECT will wait until the table is in a stable state and the transaction has been either committed or rolled back.
To get around this, you can alter the transaction isolation level on the SELECT query.
If you're specifying TABLOCK in your proc, don't.
I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.
Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.
Any ways to avoid this?
Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).
The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?
Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?
If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:
ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SET MULTI_USER;
It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.
It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.
The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...
SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID
Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.
Hope this helps.
If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.
I reckon your best way of avoiding this is to do it in SQL rather than in the application.
You can add a
WAITFOR DELAY '000:00:01'
at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway
The query is performing a table lock, thus the inserts are failing.
It sounds to me like you're keeping a lock on the table while processing the results.
You should instead load them into an array or collection of some sort, and close the database connection.
Then process the array.
In addition, while you're doing your select use either:
WITH(NOLOCK) or WITH(READPAST)
I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:
Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.
If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.
-Brian
You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.
If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.