My boss keeps on forcing me to write SELECT queries with with (nolock) to prevent deadlocks. But AFAIK, Select statements by default does not have locks, so selecting with with (nolock) and selecting without doesn't make any difference. Please correct me if I am wrong.
The two queries:
SELECT * from EMP with (nolock)
SELECT * from EMP
Isn't both are same. If I don't put nolock will it be prone to deadlocks? Please tell me what should I use.
Nolocks should be used with extreme caution. The most common understanding of nolock (read uncommitted) hint is that it reads data that has not been committed yet. However, there are other side effects that can be very dangerous. (search for "nolock" and "page splits")
There's a really good write up here... https://www.itprotoday.com/sql-server/beware-nolock-hint
In short, "nolocking"ing everything is not always a good idea... if ever.
Assuming we have default Transaction Isolation Level READ COMMITTED ,there is a chance for a dead lock even in a very simple SELECT statement , Imagine a scenario where User1 is only reading data and User2 trys to Update some data and there a non-clustered index on that table, it is possible.
User1 is reading Some Data and obtains a shared lock on the non-clustered index in order to perform a lookup, and then tries to obtain a shared lock on the page contianing the data in order to return the data itself.
User2 who is writing/Updating first obtains an exlusive lock on the database page containing the data, and then attempts to obtain an exclusive lock on the index in order to update the index.
SELECT statements do indeed apply locks unless there is a statement at the top of the query SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED.
By all means use WITH (NOLOCK) in SELECT statement on tables that have a clustered index, but it would be wiser to only do so if there's a need to.
Hint: The easiest way to add a clustered index to a table is to add an Id Primary Key column.
The result set can contain rows that have not yet been committed, that are often later rolled back.
If WITH(NOLOCK) is applied to a table that has a non-clustered index then row-indexes can be changed by other transactions as the row data is being streamed into the result-table. This means that the result-set can be missing rows or display the same row multiple times.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
Bearing in mind the issues WITH(NOLOCK) causes will help you tune your database.
As for your boss, just think of them as a challenge.
Related
I have a database with READ_COMMITTED_SNAPSHOT_ISOLATION set ON (cannot change that).
I insert new rows into a table on many parallel sessions,
but only if they don't exist there yet (classic left join check).
The inserting code looks like this:
INSERT INTO dbo.Destination(OrderID)
SELECT DISTINCT s.OrderID
FROM dbo.Source s
LEFT JOIN dbo.Destination d ON d.OrderID = s.OrderID
WHERE d.OrderID IS NULL;
If I run this on many parallel sessions I get a lot of duplicate key errors,
since different sessions try to insert the same OrderIDs over and over again.
That is expected due to the lack of SHARED locks under RCSI.
The recommended solution here (as per my research) would be to use the READCOMMITTEDLOCK hint like this:
LEFT JOIN dbo.Destination d WITH (READCOMMITTEDLOCK) ON d.OrderID = s.OrderID
This somewhat works, as greatly reduces the duplicate key errors, but (to my surprise) doesn't completely eliminate them.
As an experiment I removed the unique constraint on the Destination table, and saw that many duplicates enters the table in the very same millisecond originated from different sessions.
It seems that despite the table hint, I still get false positive on the existence check, and the redundant insert fires.
I tried different hints (SERIALIZABLE) but it made it worse and swarmed me with deadlocks.
How could I make this insert work under RCSI?
The right lock hint for reading a table you are about to insert into is (UPDLOCK,HOLDLOCK), which will place U locks on the rows as you read them, and also place SERIALIZABLE-style range locks if the row doesn't exist.
The problem with your approach is that each client is attempting to insert a batch of rows, and each batch has to either succeed completely or fail. If you use row-level locking, you will always have scenarios where a session inserts one row succesfully, but then becomes blocked waiting to read or insert a subsequent row. This inevitably leads to either PK failures or deadlocks, depending on the type of row lock used.
The solution is to either:
1) Insert the rows one-by-one, and not hold the locks from one row while you check and insert the next row.
2) Simply escalate to a tablockx, or an Applciation Lock to force your concurrent sessions to serialize through this bit of code.
So you can have highly-concurrent loads, or batch loads, but you can't have both. Well mostly.
3) You could turn on IGNORE_DUP_KEY on the index, which instead of an error will just skip any duplicate when inserting.
I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.
I have a SQL Server 2012 table that will contain 2.5 million rows at any one time. Items are always being written into the table, but the oldest rows in the table get truncated at the end of each day during a maintenance window.
I have .NET-based reporting dashboards that usually report against summary tables though on the odd occasion it does need to fetch a few rows from this table - making use of the indexes set.
When it does report against this table, it can prevent new rows being written to this table for up to 1 minute, which is very bad for the product.
As it is a reporting platform and the rows in this table never get updated (only inserted - think Twitter streaming but for a different kind of data) it isn't always necessary to wait for a gap in the transactions that cause rows to get inserted into this table.
When it comes to selecting data for reporting, would it be wise to use a SNAPSHOT isolation level within a transaction to select the data, or NOLOCK/READ UNCOMITTED? Would creating a SQLTransaction around the select statement cause the insert to block still? At the moment I am not wrapping my SQLCommand instance in a transaction, though I realise this will still cause locking regardless.
Ideally I'd like an outcome where the writes are never blocked, and the dashboards are as responsive as possible. What is my best play?
Post your query
In theory a select should not be blocking inserts.
By default a select only takes a shared lock.
Shared locks are acquired during read operations automatically and prevent the user from modifying data.
This should not block inserts to otherTable or joinTable
select otherTable.*, joinTable.*
from otherTable
join joinTable
on otherTable.jionID = joinTable.ID
But it does have the overhead of acquiring a read lock (it does not know you don't update).
But if it is only fetching a few rows from joinTable then it should only be taking a few shared locks.
Post your query, query plan, and table definitions.
I suspect you have some weird stuff going on where it is taking a lot more locks than it needs.
It may be taking lock on each row or it may be escalating to page lock or table lock.
And look at the inserts. Is it taking some crazy locks it does not need to.
I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.
Consider this statement:
update TABLE1
set FormatCode = case T.FormatCode when null then TABLE1.FormatCode else T.FormatCode end,
CountryCode = case T.CountryCode when null then TABLE1.CountryCode else T.CountryCode end
<SNIP ... LOTS of similar fields being updated>
FROM TABLE2 AS T
WHERE TABLE1.KEYFIELD = T.KEYFIELD
TABLE1 is used by other applications and so locking on it should be minimal
TABLE2 is not used by anybody else so I do not care about it.
TABLE1 and TABLE2 contain 600K rows each.
Would the above statement cause a table lock on TABLE1?
How can I modify it to cause the minimal lock on it ?
Maybe use a cursor to read the rows of TABLE2 one by one and then for each row update the respective row of TABLE1?
Sql will use row locks first. If enough rows in a index page is locked SQL will issue a page lock. If enough pages are locked SQL will issue a table lock.
So it really depends on how many locks is issued. You could user the locking hint ROWLOCK in your update statement. The down side is that you will probably have thousand of row lock instead of hundreds of page locks or one table lock. Locks use resources so while ROWLOCK hints will probably not issue a table lock it might even be worse as it could starve your server of resources and slowing it down in anycase.
You could batch the update say 1000 at a time. Cursors is really going to news things up even more. Experiment monitor analyse the results and make a choice based on the data you have gathered.
As marc_s has suggested introducing a more restrictive WHERE clause to reduce the number of rows should help here.
Since your update occurs nightly it seems you'll be looking to only update the records that have updated since the previous update occurred (ie a days worth of updates). But this will only benefit you if a subset of the records have changed rather than all of then.
I'd probably try to SELECT out the Id's for the rows that have changed into a temp table and then joining the temp table as part of the update. To determine the list of Id's a couple of options come to mind on how you can do this such as making use of a last changed column on TABLE2 (if TABLE2 has one); alternatively you could compare each field between TABLE1 and TABLE2 to see if they differ (watch for nulls), although this would be a lot of extra sql to include and probably a maintenance pain. 3rd option that I can think of would be to have an UPDATE trigger against TABLE2 to insert the KEYFIELD of the rows as they are updated during the day into our temp table, the temp table could be cleared down following your nightly update.