Preventing entire table from locking while bulk INSERT - sql-server

I have a stored procedure that performs a bulk insert in a table. I added BEGIN TRANSACTION command just above the INSERT query to enable ROLL BACK if something goes wrong. When the bulk insert initiated, it locked the entire table and other users were unable to execute SELECT on the same table.
I am not following why SQL Server locks entire table for even a SELECT.
I am using SQL Server 2005 Express. Is this a problem with this version or it persists in 2008 as well? How to overcome this situation? Writers should not block Readers.

Writers should not block Readers
This is true only for snapshot isolation, all other isolation levels require both readers to block writes and writers to block readers (dirty reads not considered, since they are inconsistent and should never be used). If you need this behavior, then use row versioning (the link contains the solution).
Why does bulk insert lock the entire table?
This actually may or may not be true. The behavior is under your control:
TABLOCK
Specifies that a table-level lock is acquired for the duration of
the bulk-import operation. A table can be loaded concurrently by
multiple clients if the table has no indexes and TABLOCK is specified.
By default, locking behavior is determined by the table option table
lock on bulk load.
For more details, read the product specifications: Controlling Locking Behavior for Bulk Import.

You have an open transaction. That means SQL Server needs to preserve the state of the table, and any changes you are in the process of making are "dirty" and uncommitted.
If you SELECT from a table that is currently being altered with an open (explicit) transaction, the SELECT will wait until the table is in a stable state and the transaction has been either committed or rolled back.
To get around this, you can alter the transaction isolation level on the SELECT query.

If you're specifying TABLOCK in your proc, don't.

Related

Row locking behaviour while updating

In Oracle databases I can start a transaction and update a row without committing. Selecting this row in another session still returns the current ("old") value.
How to get this behaviour in SQL Server? Currently, the row is locked until the transaction is ended. WITH (NOLOCK) inside the select statement gives the new value from the uncommitted transaction which is potentially dangerous.
Starting the transaction without committing:
BEGIN TRAN;
UPDATE test SET val = 'Updated' WHERE id = 1;
This works:
SELECT * FROM test WHERE id = 2;
This waits for the transaction to be committed:
SELECT * FROM test WHERE id = 1;
With Read Committed Snapshot Isolation (RCSI), versions of rows are stored in a version store, so readers can read a version of a row that existed at the time the statement started and before any changes have been made; while a transaction is open; without taking shared locks on rows or pages; and without blocking writers or other readers. From this post by Paul White:
To summarize, locking read committed sees each row as it was at the time it was briefly locked and physically read; RCSI sees all rows as they were at the time the statement began. Both implementations are guaranteed to never see uncommitted data,
One cost, of course, is that if you read a prior version of the row, it can change (even many times) before you're done doing whatever it is you plan to do with it. If you're making important decisions based on some past version of the row, it may be the case that you actually want an isolation level that forces you to wait until all changes have been committed.
Another cost is that version store is not free... it requires space and I/O in tempdb, so if tempdb is already a bottleneck on your system, this is something worth testing.
(In SQL Server 2019, with Accelerated Database Recovery, the version store shifts to the user database, which increases database size but mitigates some of the tempdb contention.)
Paul's post goes on to explain some other risks and caveats.
In almost all cases, this is still way better than NOLOCK, IMHO. Lots of links about the dangers there (and why RCSI is better) here:
I'm using NOLOCK; is that bad?
And finally, from the documentation (adding one clarification from the comments):
When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. A user-defined function can return data that was committed after the time the statement containing the UDF began.When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting * on-prem but not in Azure SQL Database *, read committed isolation uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. Both implementations meet the ISO definition of read committed isolation.

Deadlock occuring in Clustered Columnstore index

We are using clustered columnstore index in our transaction table holding order fulfillments. This table is regularly updated by different sessions. But, every session is specific to order job number and so, they are not trying to update same row at the same time. But, we are facing deadlock issues due to below scenarios between sessions.
Row group locking & Page lock
Row group locking & Row group locking
This is not specific to a stored procedure. It is due to multiple stored procedures updating this table, sequentially one by one, as part of order fulfillment.
The sample schema of the table is very simple:
CREATE TABLE OrderFulfillments
(
OrderJobNumber INT NOT NULL,
FulfilledIndividualID BIGINT NOT NULL,
IsIndividualSuppressed BIT NOT NULL,
SuppressionReason VARCHAR(100) NULL
)
I have given sample deadlock graph for your reference. Please let me know, what approach can I take to avoid this deadlock situation. We need clustered Columnstore index in this table, as we are doing aggregation operations to see how many times an Individual been fulfilled already. without columnstore index, it might be slower.
In my case, the deadlock scenario was due to lock escalations happening, as some of the fulfillments were very big and in 10,000s or in 100k ranges and it was causing lock escalation to happen to rowgroup level and in some cases, page level.
I solved this issue by having a temporary table at the very beginning of transactions and work on updates on the temporary table and finally inserting the temporary table related fulfillments information in to this OrderFulfillments. This OrderFulfillments is also being used by temporary table to see how many times the individual is already fulfilled. but, it is shared lock on the top and not exclusive locks.
By going for temporary table, every session is working on their own copy and concurrency issues are resolved.
You assume NOLOCK is the same as no locking...that is incorrect.
NOLOCK Is equivalent to READUNCOMMITTED.
• READUNCOMMITTED and NOLOCK hints apply only to data locks.
All queries, including those with READUNCOMMITTED and NOLOCK hints,
acquire Sch-S (schema stability) locks during compilation and
execution. Because of this, queries are blocked when a concurrent
transaction holds a Sch-M (schema modification) lock on the table.
For example, a data definition language (DDL) operation acquires a Sch-M
lock before it modifies the schema information of the table.
Any concurrent queries, including those running with READUNCOMMITTED or
NOLOCK hints, are blocked when attempting to acquire a Sch-S lock.
Conversely, a query holding a Sch-S lock blocks a concurrent
transaction that attempts to acquire a Sch-M lock.
READUNCOMMITTED and NOLOCK cannot be specified for tables modified by
insert, update, or delete operations. The SQL Server query optimizer
ignores the READUNCOMMITTED and NOLOCK hints in the FROM clause that
apply to the target table of an UPDATE or DELETE statement.
You can minimize locking contention while protecting transactions from
dirty reads of uncommitted data modifications by using either of the
following:
• The READ COMMITTED isolation level with the
READ_COMMITTED_SNAPSHOT database option set ON.
• The SNAPSHOT
isolation level. For more information about isolation levels, see SET
TRANSACTION ISOLATION LEVEL (Transact-SQL).
https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table
Understand how your Indexes are structured can cause blocking if say, a select statement requires an entire page that your UPDATE is modifying concurrently.
Limit your variables upon testing.
Consider splitting your DML into sections. You may find an optimal range for performing concurrent modifications of your table data.

We only have SQL Server Standard edition so no access to snapshot functionality. Options?

We only have SQL Servre Standard edition so I can't use the Snapshot functionality. Before spending the time just want to know if the following is possible (or if there is a better way) please:
At the end of every month I need to take a snapshot of the month and store it in table b. The following month take another snapshot and append that snapshots data to table b. And so on....
Is it possible to create a stored procedure to run at the end of every month that stores the snapshot data into a temp table A. Then using another stored procedure, take data from temp table A and append to table B? The second procedure can have a drop table A.
Cheers.
Yes, it is possible.
If I understand you, more or less, this is what you want:
Lock the table
Select everything into a staging table
Move everything from that staging table into your destination
You can lock the entire table (this will prevent changes, but can lead to deadlocks).
INSERT INTO stagingTable (
... -- field list
)
SELECT
... -- field list
FROM
myTable WITH (TABLOCK)
;
TABLOCK will place a shared lock on the table which will be released when the statement is executed (READ COMMITTED isolation level) or after the transaction is committed/rolled back (SERIALIZABLE).
If you want to keep the lock during the whole transaction, you can add the HOLDLOCK hint too, which switches the isolation level to serializable for the object, thus the lock will be released after COMMIT. Don't forget to start a transaction and commit/roll it back.
You can also use TABLOCKX, which is an exclusive lock preventing all processes to acquire a lock on the table or on anything on lower levels (pages, rows, etc) in the table. This will prevent concurrent reads too!
You can let the SQL Server to decide which lock it wants to use (a.k.a. omit the hint), in this case SQL Server may choose to use more granular locks (such as page or row locks) instead of locking the whole table.

SQL Server Isolation Levels - Repeatable Read

I'm having problems getting my head round why this is happening. Pretty sure I understand the theory, but something else must be going on that I don't see.
Table A has the following schema:
ID [Primary Key]
Name
Type [Foreign Key]
SprocA sets Isolation Level to Repeatable Read, and Selects rows from Table A that have Type=1. It also updates these rows.
SprocB selects rows from Table A that have Type=2.
Now given that these are completely different rowsets, if I execute both at the same time (and put WAITFOR calls to slow it down), SprocB doesn't complete until SprocA.
I know it's to do with the query on Type, as if I select based on the Primary ID then it allows concurrent access to the table.
Anyone shed any light?
Cheers
With Repeatable Read set for the isolation level, you will hold a shared lock on all data you read until the transaction completes. That is until you COMMIT or ROLLBACK.
This will lower the concurrency of your application's access to this data. So if your first procedure SELECTS from table then calls a WAITFOR then SELECTS again etc within a transaction you will hold the shared lock the entire time until you commit the transaction or the process completes.
If this is a test procedure you are working with try added a COMMIT after each select and see if that helps the second procedure to run concurrently.
Good luck!
Kevin
SQL Server uses indexes to do range locks (which is what repeatable reads often use) so if you don't have index on Type perhaps it locks entire table...
The thing to remember is that the locked rows are black boxes to the other process.
You know that SprocA is just reading for type = 1 and that SprocbB is just reading for type = 2.
However, SprocB does not know what SprocA is going to do to those records. Before the transaction is completed, SprocA may update all of the records to type = 2. In that case, SprocB would be working incorrectly if it did not wait for SprocA to complete.
Maintaining concurrency when performing range locks / bulk changes is tough.

SQL Server SELECT statements causing blocking

We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)

Resources