I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.
Related
I have a SQL Server 2012 table that will contain 2.5 million rows at any one time. Items are always being written into the table, but the oldest rows in the table get truncated at the end of each day during a maintenance window.
I have .NET-based reporting dashboards that usually report against summary tables though on the odd occasion it does need to fetch a few rows from this table - making use of the indexes set.
When it does report against this table, it can prevent new rows being written to this table for up to 1 minute, which is very bad for the product.
As it is a reporting platform and the rows in this table never get updated (only inserted - think Twitter streaming but for a different kind of data) it isn't always necessary to wait for a gap in the transactions that cause rows to get inserted into this table.
When it comes to selecting data for reporting, would it be wise to use a SNAPSHOT isolation level within a transaction to select the data, or NOLOCK/READ UNCOMITTED? Would creating a SQLTransaction around the select statement cause the insert to block still? At the moment I am not wrapping my SQLCommand instance in a transaction, though I realise this will still cause locking regardless.
Ideally I'd like an outcome where the writes are never blocked, and the dashboards are as responsive as possible. What is my best play?
Post your query
In theory a select should not be blocking inserts.
By default a select only takes a shared lock.
Shared locks are acquired during read operations automatically and prevent the user from modifying data.
This should not block inserts to otherTable or joinTable
select otherTable.*, joinTable.*
from otherTable
join joinTable
on otherTable.jionID = joinTable.ID
But it does have the overhead of acquiring a read lock (it does not know you don't update).
But if it is only fetching a few rows from joinTable then it should only be taking a few shared locks.
Post your query, query plan, and table definitions.
I suspect you have some weird stuff going on where it is taking a lot more locks than it needs.
It may be taking lock on each row or it may be escalating to page lock or table lock.
And look at the inserts. Is it taking some crazy locks it does not need to.
I am using Entity Framework, and I am inserting records into our database which include a blob field. The blob field can be up to 5 MB of data.
When inserting a record into this table, does it lock the whole table?
So if you are querying any data from the table, will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load, will it take longer to cause a deadlock?
Is there a way to monitor and see what is locked at any particular time?
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
This is taking into account that most of my code is just a bunch of select statements, not heaps of long running transactions or anything like that.
Holy cow, you've got a lot of questions in here, heh. Here's a few answers:
When inserting a record into this table, does it lock the whole table?
Not by default, but if you use the TABLOCK hint or if you're doing certain kinds of bulk load operations, then yes.
So if you are querying any data from the table will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
This one gets a little trickier. If someone's trying to select data from a page in the table that you've got locked, then yes, you'll block 'em. You can work around that with things like the NOLOCK hint on a select statement or by using Read Committed Snapshot Isolation. For a starting point on how isolation levels work, check out Kendra Little's isolation levels poster.
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load will it take longer to cause a deadlock?
Deadlocks aren't based on time - they're based on dependencies. Say we've got this situation:
Query A is holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query B
Query B is also holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query A
Neither query can move forward (think Mexican standoff) so SQL Server calls it a draw, shoots somebody's query in the back, releases his locks, and lets the other query keep going. SQL Server picks the victim based on which one will be less expensive to roll back. If you want to get fancy, you can use SET DEADLOCK_PRIORITY LOW on particular queries to paint targets on their back, and SQL Server will shoot them first.
Is there a way to monitor and see what is locked at any particular time?
Absolutely - there's Dynamic Management Views (DMVs) you can query like sys.dm_tran_locks, but the easiest way is to use Adam Machanic's free sp_WhoIsActive stored proc. It's a really slick replacement for sp_who that you can call like this:
sp_WhoIsActive #get_locks = 1
For each running query, you'll get a little XML that describes all of the locks it holds. There's also a Blocking column, so you can see who's blocking who. To interpret the locks being held, you'll want to check the Books Online descriptions of lock types.
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
Believe it or not, a single query can actually deadlock itself, and yes, queries can deadlock on just one table. To learn even more about deadlocks, check out The Difficulty with Deadlocks by Jeremiah Peschka.
If you have direct control over the SQL, you can force row level locking using:
INSERT INTO WITH (ROWLOCK) MyTable(Id, BigColumn)
VALUES(...)
These two answers might be helpful:
Is it possible to force row level locking in SQL Server?
Locking a table with a select in Entity Framework
To view current held locks in Management Studio, look under the server, then under Management/Activity Monitor. It has a section for locks by object, so you should be able to see whether the inserts are really causing a problem.
Deadlock errors generally return quite quickly. Deadlock states do not occur as a result of a timeout error occurring while waiting for a lock. Deadlock is detected by SQL Server by looking for cycles in the lock requests.
The best answer I can come up with is: It depends.
The best way to check is to find your connection SPID and use sp_lock SPID to check if the lock mode is X on the TAB type. You can also verify the table name with SELECT OBJECT_NAME(objid). I also like to use the below query to check for locking.
SELECT RESOURCE_TYPE,RESOURCE_SUBTYPE,DB_NAME(RESOURCE_DATABASE_ID) AS 'DATABASE',resource_database_id DBID,
RESOURCE_DESCRIPTION,RESOURCE_ASSOCIATED_ENTITY_ID,REQUEST_MODE,REQUEST_SESSION_ID,
CASE WHEN RESOURCE_TYPE = 'OBJECT' THEN OBJECT_NAME(RESOURCE_ASSOCIATED_ENTITY_ID,RESOURCE_DATABASE_ID) ELSE '' END OBJETO
FROM SYS.DM_TRAN_LOCKS (NOLOCK)
WHERE REQUEST_SESSION_ID = --SPID here
In SQL Server 2008 (and later) you can disable the lock escalation on the table and enforce a WITH (ROWLOCK) in your insert clause effectively forcing a rowlock. This can't be done prior to SQL Server 2008 (you can write WITH ROWLOCK, but SQL Server can choose to ignore it).
I'm speaking generals here, and I don't have much experience with BLOBs as I usually advise developers to avoid them, especially if larger than 1 MB.
We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)
I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...
I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.
Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.
Any ways to avoid this?
Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).
The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?
Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?
If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:
ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SET MULTI_USER;
It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.
It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.
The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...
SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID
Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.
Hope this helps.
If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.
I reckon your best way of avoiding this is to do it in SQL rather than in the application.
You can add a
WAITFOR DELAY '000:00:01'
at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway
The query is performing a table lock, thus the inserts are failing.
It sounds to me like you're keeping a lock on the table while processing the results.
You should instead load them into an array or collection of some sort, and close the database connection.
Then process the array.
In addition, while you're doing your select use either:
WITH(NOLOCK) or WITH(READPAST)
I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:
Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.
If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.
-Brian
You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.
If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.