Why UPDATE blocks SELECT on unrelated rows? - sql-server

Having the table, defined by script [1], I execute scripts in 2 windows of SSMS
--1) first in first SSMS window
set transaction isolation level READ UNCOMMITTED;
begin transaction;
update aaa set Name ='bbb'
where id=1;
-- results in "(1 row(s) affected)"
--rollback
and after 1)
--2)after launching 1)
select * from aaa --deleted comments
where id<>1
--is blocked
Independently on transaction isolation level in 1) window, the SELECT in 2) is blocked.
Why?
Does isolation level for UPDATE have any influence on statements on other transactions?
The highest isolation level is default READ COMMITTED in 2).
No range locks are attributed, SELECT should have suffered from COMMITTED READS (NONREPEATABLE READs) and PHANTOM READS (Repeatable Reads) problems [2]
How to make it suffer?
How can UPDATE be made without blocking SELECT?
[1]
CREATE TABLE aaa
(
Id int IDENTITY(1,1) NOT NULL,
Name varchar(13) NOT NULL
)
insert into aaa(Name)
select '111' union all
select '222' union all
select '333' union all
select '444' union all
select '555' union all
select '666' union all
select '777' union all
select '888'
[2]
Copy&paste or add trailing ) upon clicking
http://en.wikipedia.org/wiki/Isolation_(database_systems)
Update:
SELECT WITH(NOLOCK) is not blocked...
Update2:
or with, what is the same, READ UNCOMMITTED
Note that UPDATE is on different from SELECT row.
Even, if on the same, this behavior contradicts to description of isolation levels [2]
The points are that:
suppose I cannot know who else is going to SELECT from the same (UPDATE-d) table but on unrelated to update rows
to understand isolation levels [2]
SQL Server 2008 R2 Dev

I believe it's because you don't have a primary key, which I think is resulting in the locks being escalated, hence blocking out the SELECT. If you add a PRIMARY KEY onto the ID column, you will notice that if you try again, the SELECT will return the other 3 rows now - no WITH (NOLOCK) hint needed.

Repeating tests after
--3)
create index IX_aaa_ID on aaa(id)
SELECT 2) is still blocked
--4)
drop index IX_aaa_ID on aaa
create unique index IX_aaa_ID on aaa(id)
--or adding primary key constraint
SELECT 2) is NOT blocked
If to modify 2) as
--2b)
select * from aaa
where id=3
--or as
--WHERE id=2
shows that 2b) is not blocked even in absence of any index or PK.
Though, 2b), without any indexes, is blocked after modifying 1) UPDATE to run under serializable
but not under REPEATABLE READ or lower
--1c)
set transaction isolation level serializable;
--set transaction isolation level REPEATABLE READ;
begin transaction;
update aaa set Name ='bbb'
where id=1;
--rollback
So, it looks like multiple row selection attempts to acquire non-shareable lock?
Update:
Well, in all cases of SELECT being blocked it is waiting to acquire LCK_M_IS
Good reason to uderstand this cuisine
Update2:
Well, it is not UPDATE lock that is escalated on the table, it is SELECT (shared) locks (when SELECT tries to read multiple rows) are escalated to a table lock and cannot be granted because table has already exclusive (UPDATE) lock.
And presence or absence of index was unrelated to my primary question
I shift the discussion of this topic to my submitted suggestion "Intent rowlocks should not be escalated to a table lock if a table already contains exclusive lock"

Related

Is it possible to produce phantom read in single SQL Server query?

All of explanations of phantom reads I managed to find demonstrate phantom read by running 2 select statements in one transaction (e.g. https://blobeater.blog/2017/10/26/sql-server-phantom-reads/ )
BEGIN TRAN
SELECT #1
DELAY DURING WHICH AN INSERT TAKES PLACE IN A DIFFERENT TRANSACTION
SELECT #2
END TRAN
Is it possible to reproduce a phantom read in one select statement? This would mean that select statement starts on transaction #1. Then insert runs on transaction #2 and commits. Finally select statement from transaction #1 completes, but does not return a row that transaction #2 has inserted.
The SQL Server Transaction Isolation Levels documentation defines a phantom row as one "that matches the search criteria but is not initially seen" (emphasis mine). Consequently, more than one SELECT statement is needed for a phantom read to occur.
Data inserted during execution SELECT statement execution might not be returned in the READ COMMITTED isolation level depending on the timing but this is not a phantom read by definition. The example below shows this behavior.
--create table with enough data for a long-running SELECT query
CREATE TABLE dbo.PhantomReadExample(
PhantomReadExampleID int NOT NULL
CONSTRAINT PK_PhantomReadExample PRIMARY KEY
, PhantomReadData char(8000) NOT NULL
);
--insert 100K rows
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,t1m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b)
INSERT INTO dbo.PhantomReadExample WITH(TABLOCKX) (PhantomReadExampleID, PhantomReadData)
SELECT num*2, 'data'
FROM t1m
WHERE num <= 100000;
GO
--run this on connection 1
SELECT *
FROM dbo.PhantomReadExample
ORDER BY PhantomReadExampleID;
GO
--run this on connection 2 while the connection 1 SELECT is running
INSERT INTO dbo.PhantomReadExample(PhantomReadExampleID, PhantomReadData)
VALUES(1, 'data');
GO
Shared locks are acquired on rows as they are read during the SELECT query scan to ensure only committed data are read but these are immediately released once data are read improve concurrency. This allows other sessions to insert, update, and delete rows while the SELECT query is running.
The inserted row is not returned in this case because the ordered clustered index scan had already past the point of the insert.
Below is the wikipedia definition of phantom reads
A phantom read occurs when, in the course of a transaction, new rows
are added by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfill
that WHERE clause.
This is certainly possible to reproduce in a single reading query (of course other database activity must also be happening to produce the phantom rows).
Setup
CREATE TABLE Test(X INT PRIMARY KEY);
Connection 1 (leave this running)
SET NOCOUNT ON;
WHILE 1 = 1
INSERT INTO Test VALUES (CRYPT_GEN_RANDOM(4))
Connection 2
This is extremely likely to return some rows if running at read committed lock isolation level (default for the on premise product and enforced with table hint below)
WITH CTE AS
(
SELECT *
FROM Test WITH (READCOMMITTEDLOCK)
WHERE X BETWEEN 0 AND 2147483647
)
SELECT *
FROM CTE c1
FULL OUTER HASH JOIN CTE c2 ON c1.X = c2.X
WHERE (c1.X IS NULL OR c2.X IS NULL)
The returned rows are values added between the first and second read of the table for rows matching the WHERE X BETWEEN 0 AND 2147483647 predicate.

Does a long-running query prevent an insert to tables involved?

Say I have two tables:
Table 1
[Clustered Id] [Text Field]
Table 2
[Clustered Id] [Numeric Field]
Then I have a query:
select *
from [Table 1]
,[Table 2]
where [Table 1].[Clustered Id] = [Table 2].[Clustered Id]
and [Table 1].[Text Field] like '%some string%'
Say my insert inserts one row, and looks like this:
insert into [Table 2]
values (new clustered ID)
,-182
If this query takes a long time to run, would an insert to [Table 2] be possible during that time? If so, what are the nuances? If not, what could I do to avoid it?
Yes a select will take a shared lock that will prevent an update lock.
You could use the hint "with (nolock)" on the select so that it does not take shared lock and does not prevent an update lock. But bad things could happen. A lot of people on this site will tell you never to do that.
If an update it just taking a rowlock then only that row needs to be open.
On an update it really helps to add <> mirror to the set so it will not take a lock
update table1
set col1 = 12
where col3 = 56
and co1 <> 12 -- will not take an update lock
An insert is different as it would only block on pagelock and tablock.
Please post your insert and how many rows you are inserting.
If you are taking a tablock then I think inserts would be blocked. Even with repeatable read I don't think a select would block an insert.
Unless you are in serializable isolation level,you don't need to worry.Your selects wont block inserts..
Select Acquires shared locks.Talking about low level,SQL requires Exclusive lock on the row it is trying to insert.We also know Exclusive lock is not compatible with shared lock..Now a question arises ,how can a select will be blocked by an insert which doesn't have a row at all.
Isolation level determines how much duration the select locks will be held..In normal isolation levels,shared lock will be released as soon as the row is read ..
Only in serializable,range locks are taken and lock wont be released until the select is totally completed..

Set-based bulk import of denormalized data into normalized SQL Server 2014 database tables

The following simplified mock-up works fine to bulk/set based insert the denormalised data in #BulkData (improvement suggestions welcome):
IF OBJECT_ID('tempdb..#Things') IS NOT NULL
DROP TABLE #Things
IF OBJECT_ID('tempdb..#Categories') IS NOT NULL
DROP TABLE #Categories
IF OBJECT_ID('tempdb..#ThingsToCategories') IS NOT NULL
DROP TABLE #ThingsToCategories
IF OBJECT_ID('tempdb..#BulkData') IS NOT NULL
DROP TABLE #BulkData
CREATE TABLE #Things
(
ThingId INT IDENTITY(1,1) PRIMARY KEY,
ThingName NVARCHAR(255)
)
CREATE TABLE #Categories
(
CategoryId INT IDENTITY(1,1) PRIMARY KEY,
CategoryName NVARCHAR(255)
)
CREATE TABLE #ThingsToCategories
(
ThingId INT,
CategoryId INT
)
CREATE TABLE #BulkData
(
ThingName NVARCHAR(255),
CategoryName NVARCHAR(255)
)
-- the following would be done from a flat file via a bulk import
INSERT INTO #BulkData
SELECT N'Thing1', N'Category1'
UNION
SELECT N'Thing2', N'Category1'
UNION
SELECT N'Thing3', N'Category2'
INSERT INTO #Categories
SELECT DISTINCT CategoryName
FROM #BulkData
WHERE CategoryName NOT IN (SELECT DISTINCT CategoryName
FROM #Categories)
INSERT INTO #Things
SELECT DISTINCT ThingName
FROM #BulkData
WHERE ThingName NOT IN (SELECT DISTINCT ThingName FROM #Things)
INSERT INTO #ThingsToCategories
SELECT ThingId, CategoryId
FROM #BulkData
INNER JOIN #Things ON #BulkData.ThingName = #Things.ThingName
INNER JOIN #Categories ON #BulkData.CategoryName = #Categories.CategoryName
SELECT * FROM #Categories
SELECT * FROM #Things
SELECT * FROM #ThingsToCategories
One issue I have with the above, is that data in #Things would be accessible before data are inserted into #ThingsToCategories.
Can I wrap the above in a transaction (?) to only make #Things available when the whole bulk import has finished?
Like so:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
Does this work with a couple of million records though?
I guess one could also lower the logging level?
Can I wrap the above in a transaction (?) to only make #Things available when the whole bulk import has finished? Like so:
BEGIN TRANSACTION X
-- insert into all normalised tables
COMMIT TRANSACTION X
The answer is Yes. From the Documentation on Transactions:
A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.
Transactions have the following four standard properties, usually referred to by the acronym ACID. Quoting from the following link on tutorialspoint.com on SQL Transactions:
Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.
Consistency: ensures that the database properly changes states upon a successfully committed transaction.
Isolation: enables transactions to operate independently of and transparent to each other.
Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.
Will this work with a few million entries?
Again, Yes. The amount of entries is irrelevant. In my own words this time:
Atomicity: If a transaction succeeds all the operations within the transaction will take effect as soon as the transaction completes, i.e. at the time the transaction is committed. If at least one of the operations in the transaction fails, all operations are rolled back (in other words, none take hold). The amount of operations within the transaction is irrelevant.
Isolation: Other transactions won't see operations of other transactions, unless they are committed.
There are however different Transaction Isolation Levels. The default for SQL Server is READ COMMITTED:
Specifies that statements cannot read data that has been modified but not committed by other transactions. [...]
This is a trade-off level for balance between performance and consistency. Ideally you would want everything SERIALIZABLE (see documentation, too long to copy/paste). This isolation level trades performance(-) for consistency(+). In a lot of cases the READ COMMITTED isolation level is good enough, but you should be aware of how it works and put that against how your transaction is supposed to work vis-à-vis completions of other transactions.
Note also that a transaction will put locks on database objects (rows, table, schema...) and that other transactions will block if they want to read or modify those objects (depending on the type of lock). For that reason it is preferable to keep the amount of operations within a transaction low. Sometimes though, transactions just do a lot of things and they can't be broken up.

In tsql is an Insert with a Select statement safe in terms of concurrency?

In my answer to this SO question I suggest using a single insert statement, with a select that increments a value, as shown below.
Insert Into VersionTable
(Id, VersionNumber, Title, Description, ...)
Select #ObjectId, max(VersionNumber) + 1, #Title, #Description
From VersionTable
Where Id = #ObjectId
I suggested this because I believe that this statement is safe in terms of concurrency, in that if another insert for the same object id is run at the same time, there is no chance of having duplicate version numbers.
Am I correct?
As Paul writes: No, it's not safe, for which I would like to add empirical evidence: Create a table Table_1 with one field ID and one record with value 0. Then execute the following code simultaneously in two Management Studio query windows:
declare #counter int
set #counter = 0
while #counter < 1000
begin
set #counter = #counter + 1
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
end
Then execute
SELECT ID, COUNT(*) FROM Table_1 GROUP BY ID HAVING COUNT(*) > 1
On my SQL Server 2008, one ID (662) was created twice. Thus, the default isolation level applied to single statements is not sufficient.
EDIT: Clearly, wrapping the INSERT with BEGIN TRANSACTION and COMMIT won't fix it, since the default isolation level for transactions is still READ COMMITTED, which is not sufficient. Note that setting the transaction isolation level to REPEATABLE READ is also not sufficient. The only way to make the above code safe is to add
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
at the top. This, however, caused deadlocks every now and then in my tests.
EDIT: The only solution I found which is safe and does not produce deadlocks (at least in my tests) is to explicitly lock the table exclusively (default transaction isolation level is sufficient here). Beware though; this solution might kill performance:
...loop stuff...
BEGIN TRANSACTION
SELECT * FROM Table_1 WITH (TABLOCKX, HOLDLOCK) WHERE 1=0
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
COMMIT
...loop end...
The default isolation of read commited makes this unsafe, if two of these run in perfect paralel you will get a duplicate since there is no read lock applied.
You need REPEATABLE READ or SERIALIZABLE isolation levels to make it safe.
I think you're assumption is incorrect. When you query the VersionNumber table, you are only putting a read lock on the row. This does not prevent other users from reading the same row from the same table. Therefore, it is possible for two processes to read the same row in the VersionNumber table at the same time and generate the same VersionNumber value.
You need a unique constraint on (Id, VersionNumber) to enforce it
I'd use ROWLOCK, XLOCK hints to block other folk reading the locked row where you calculate
or wrap the INSERT in a TRY/CATCH. If I get a duplicate, try again...

SQL Server READPAST hint

I'm seeing behavior which looks like the READPAST hint is set on the database itself.
The rub: I don't think this is possible.
We have table foo (id int primary key identity, name varchar(50) not null unique);
I have several threads which do, basically
id = select id from foo where name = ?
if id == null
insert into foo (name) values (?)
id = select id from foo where name = ?
Each thread is responsible for inserting its own name (no two threads try to insert the same name at the same time). Client is java.
READ_COMMITTED_SNAPSHOT is ON, transaction isolation is specifically set to READ COMMITTED, using Connection.setTransactionIsolation( Connection.TRANSACTION_READ_COMMITTED );
Symptom is that if one thread is inserting, the other thread can't see it's row -- even rows which were committed to the database before the application started -- and tries to insert, but gets a duplicate-key-exception from the unique index on name.
Throw me a bone here?
You're at the wrong isolation level. Remember what happens with the snapshot isolation level. If one transaction is making a change, no other concurrent transactions see that transaction. Period. Other transactions only will see your changes once you have committed, but only if they START after your commit. The solution to this is to use a different isolation level. Wrap your statements in a transaction and SET TRANSACTION LEVEL SERIALIZABLE. This will ensure that your other concurrent transactions work as if they were all run serially, which is what you seem to want here.
Sounds like you're not wrapping the select and insert into a transaction?
As a solution, you could:
insert into foo (col1,col2,col3) values ('a','b','c')
where not exists (select * from foo where col1 = 'a')
After this, ##rowcount will be 1 if can check if a row was inserted.
SELECT SCOPE_IDENTITY()
should do the trick here...
plus wrapping into a transaction like previous poster mentioned.
The moral of this story is fully explained in my blog post "You can't hold onto nothing" but the short version of this is that you want to use the HOLDLOCK hint. I use the pattern:
INSERT INTO dbo.Foo(Name)
SELECT TOP 1
#name AS Name
FROM (SELECT 1 AS FakeColumn) AS FakeTable
WHERE NOT EXISTS (SELECT * FROM dbo.Foo WITH (HOLDLOCK)
WHERE Name=#name)
SELECT ID FROM dbo.Foo WHERE Name=#name

Resources