In my circumstance NOLOCK, Snapshot or something else? - sql-server

I have a SQL Server 2012 table that will contain 2.5 million rows at any one time. Items are always being written into the table, but the oldest rows in the table get truncated at the end of each day during a maintenance window.
I have .NET-based reporting dashboards that usually report against summary tables though on the odd occasion it does need to fetch a few rows from this table - making use of the indexes set.
When it does report against this table, it can prevent new rows being written to this table for up to 1 minute, which is very bad for the product.
As it is a reporting platform and the rows in this table never get updated (only inserted - think Twitter streaming but for a different kind of data) it isn't always necessary to wait for a gap in the transactions that cause rows to get inserted into this table.
When it comes to selecting data for reporting, would it be wise to use a SNAPSHOT isolation level within a transaction to select the data, or NOLOCK/READ UNCOMITTED? Would creating a SQLTransaction around the select statement cause the insert to block still? At the moment I am not wrapping my SQLCommand instance in a transaction, though I realise this will still cause locking regardless.
Ideally I'd like an outcome where the writes are never blocked, and the dashboards are as responsive as possible. What is my best play?

Post your query
In theory a select should not be blocking inserts.
By default a select only takes a shared lock.
Shared locks are acquired during read operations automatically and prevent the user from modifying data.
This should not block inserts to otherTable or joinTable
select otherTable.*, joinTable.*
from otherTable
join joinTable
on otherTable.jionID = joinTable.ID
But it does have the overhead of acquiring a read lock (it does not know you don't update).
But if it is only fetching a few rows from joinTable then it should only be taking a few shared locks.
Post your query, query plan, and table definitions.
I suspect you have some weird stuff going on where it is taking a lot more locks than it needs.
It may be taking lock on each row or it may be escalating to page lock or table lock.
And look at the inserts. Is it taking some crazy locks it does not need to.

Related

How to reduce blocking during concurrent DELETE & INSERT to a single table in SQL Server

We have a stored procedure which loads order details about an order. We always want the latest information about an order, so order details for the order are regenerated every time, when the stored procedure is called. We are using SQL Server 2016.
Pseudo code:
DELETE by clustered index based on order identifier
INSERT into the table, based on a huge query containing information about order
When multiple end-users are executing the stored procedure concurrently, there is a blocking created on orderdetails table. Once the first caller is done, second caller is queued, followed by third caller. So, the time for the generation of the orderdetails increases as time goes by. This is happening especially in the cases of big orders containing details rows in > 100k or 1 or 2 million, as there is table level lock is happening.
The approach we took
We partitioned the table based on the last digit of the order identifier for concurrent orderdetails loading. This improves the performance in the case of first time orderdetails loading, as there are no deletes. But, second time onwards, INSERT in first session is causing blocking for other sessions DELETE. The other sessions are blocked till first session is done with INSERT.
We are considering creation of separate orderdetails table for every order to avoid this concurrency issues.
Question
Can you please suggest some approach, which will support concurrent DELETE & INSERT scenario ?
We solved the contention issue by going for temporary table for orderdetails. We found that huge queries are taking longer SELECT time and this longer time was contributing to longer table level locks on the orderdetails table.
So, we first loaded data into temporary table #orderdetail and then went for DELETE and INSERT in the orderdetail table.
As the orderdetail table is already partitioned, DELETE were faster and INSERT were happening in parallel. INSERT was also very fast here, as it is simple table scan from #orderdetail table.
You can give a look to the Hekaton Engine. It is available even in SQL Server Standard Edition if you are using SP1.
If this is too complicated for implementation due to hardware or software limitations, you can try to play with the Isolation Levels of the database. Sometimes, queries that are reading huge amount of data are blocked or even deadlock victims of queries which are modifying parts of these data. You can ask yourself do you need to guarantee that the data read by the user is valid or you can afford for example some dirty reads?

Avoiding Locking Contention on DB2 zOS

I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.

Understanding SQL Server LOCKS on SELECT queries

I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.

Does inserting data into SQL Server lock the whole table?

I am using Entity Framework, and I am inserting records into our database which include a blob field. The blob field can be up to 5 MB of data.
When inserting a record into this table, does it lock the whole table?
So if you are querying any data from the table, will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load, will it take longer to cause a deadlock?
Is there a way to monitor and see what is locked at any particular time?
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
This is taking into account that most of my code is just a bunch of select statements, not heaps of long running transactions or anything like that.
Holy cow, you've got a lot of questions in here, heh. Here's a few answers:
When inserting a record into this table, does it lock the whole table?
Not by default, but if you use the TABLOCK hint or if you're doing certain kinds of bulk load operations, then yes.
So if you are querying any data from the table will it block until the insert is done (I realise there are ways around this, but I am talking by default)?
This one gets a little trickier. If someone's trying to select data from a page in the table that you've got locked, then yes, you'll block 'em. You can work around that with things like the NOLOCK hint on a select statement or by using Read Committed Snapshot Isolation. For a starting point on how isolation levels work, check out Kendra Little's isolation levels poster.
How long will it take before it causes a deadlock? Will that time depend on how much load is on the server, e.g. if there is not much load will it take longer to cause a deadlock?
Deadlocks aren't based on time - they're based on dependencies. Say we've got this situation:
Query A is holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query B
Query B is also holding a bunch of locks, and to finish his query, he needs stuff that's locked by Query A
Neither query can move forward (think Mexican standoff) so SQL Server calls it a draw, shoots somebody's query in the back, releases his locks, and lets the other query keep going. SQL Server picks the victim based on which one will be less expensive to roll back. If you want to get fancy, you can use SET DEADLOCK_PRIORITY LOW on particular queries to paint targets on their back, and SQL Server will shoot them first.
Is there a way to monitor and see what is locked at any particular time?
Absolutely - there's Dynamic Management Views (DMVs) you can query like sys.dm_tran_locks, but the easiest way is to use Adam Machanic's free sp_WhoIsActive stored proc. It's a really slick replacement for sp_who that you can call like this:
sp_WhoIsActive #get_locks = 1
For each running query, you'll get a little XML that describes all of the locks it holds. There's also a Blocking column, so you can see who's blocking who. To interpret the locks being held, you'll want to check the Books Online descriptions of lock types.
If each thread is doing queries on single tables, is there then a case where blocking can occur? So isn't it the case that a deadlock can only occur if you have a query which has a join and is acting on multiple tables?
Believe it or not, a single query can actually deadlock itself, and yes, queries can deadlock on just one table. To learn even more about deadlocks, check out The Difficulty with Deadlocks by Jeremiah Peschka.
If you have direct control over the SQL, you can force row level locking using:
INSERT INTO WITH (ROWLOCK) MyTable(Id, BigColumn)
VALUES(...)
These two answers might be helpful:
Is it possible to force row level locking in SQL Server?
Locking a table with a select in Entity Framework
To view current held locks in Management Studio, look under the server, then under Management/Activity Monitor. It has a section for locks by object, so you should be able to see whether the inserts are really causing a problem.
Deadlock errors generally return quite quickly. Deadlock states do not occur as a result of a timeout error occurring while waiting for a lock. Deadlock is detected by SQL Server by looking for cycles in the lock requests.
The best answer I can come up with is: It depends.
The best way to check is to find your connection SPID and use sp_lock SPID to check if the lock mode is X on the TAB type. You can also verify the table name with SELECT OBJECT_NAME(objid). I also like to use the below query to check for locking.
SELECT RESOURCE_TYPE,RESOURCE_SUBTYPE,DB_NAME(RESOURCE_DATABASE_ID) AS 'DATABASE',resource_database_id DBID,
RESOURCE_DESCRIPTION,RESOURCE_ASSOCIATED_ENTITY_ID,REQUEST_MODE,REQUEST_SESSION_ID,
CASE WHEN RESOURCE_TYPE = 'OBJECT' THEN OBJECT_NAME(RESOURCE_ASSOCIATED_ENTITY_ID,RESOURCE_DATABASE_ID) ELSE '' END OBJETO
FROM SYS.DM_TRAN_LOCKS (NOLOCK)
WHERE REQUEST_SESSION_ID = --SPID here
In SQL Server 2008 (and later) you can disable the lock escalation on the table and enforce a WITH (ROWLOCK) in your insert clause effectively forcing a rowlock. This can't be done prior to SQL Server 2008 (you can write WITH ROWLOCK, but SQL Server can choose to ignore it).
I'm speaking generals here, and I don't have much experience with BLOBs as I usually advise developers to avoid them, especially if larger than 1 MB.

Long query prevents inserts

I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.
Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.
Any ways to avoid this?
Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).
The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?
Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?
If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:
ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SET MULTI_USER;
It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.
It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.
The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...
SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID
Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.
Hope this helps.
If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.
I reckon your best way of avoiding this is to do it in SQL rather than in the application.
You can add a
WAITFOR DELAY '000:00:01'
at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway
The query is performing a table lock, thus the inserts are failing.
It sounds to me like you're keeping a lock on the table while processing the results.
You should instead load them into an array or collection of some sort, and close the database connection.
Then process the array.
In addition, while you're doing your select use either:
WITH(NOLOCK) or WITH(READPAST)
I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:
Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.
If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.
-Brian
You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.
If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.

Resources