Is it possible to produce phantom read in single SQL Server query? - sql-server

All of explanations of phantom reads I managed to find demonstrate phantom read by running 2 select statements in one transaction (e.g. https://blobeater.blog/2017/10/26/sql-server-phantom-reads/ )
BEGIN TRAN
SELECT #1
DELAY DURING WHICH AN INSERT TAKES PLACE IN A DIFFERENT TRANSACTION
SELECT #2
END TRAN
Is it possible to reproduce a phantom read in one select statement? This would mean that select statement starts on transaction #1. Then insert runs on transaction #2 and commits. Finally select statement from transaction #1 completes, but does not return a row that transaction #2 has inserted.

The SQL Server Transaction Isolation Levels documentation defines a phantom row as one "that matches the search criteria but is not initially seen" (emphasis mine). Consequently, more than one SELECT statement is needed for a phantom read to occur.
Data inserted during execution SELECT statement execution might not be returned in the READ COMMITTED isolation level depending on the timing but this is not a phantom read by definition. The example below shows this behavior.
--create table with enough data for a long-running SELECT query
CREATE TABLE dbo.PhantomReadExample(
PhantomReadExampleID int NOT NULL
CONSTRAINT PK_PhantomReadExample PRIMARY KEY
, PhantomReadData char(8000) NOT NULL
);
--insert 100K rows
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,t1m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b)
INSERT INTO dbo.PhantomReadExample WITH(TABLOCKX) (PhantomReadExampleID, PhantomReadData)
SELECT num*2, 'data'
FROM t1m
WHERE num <= 100000;
GO
--run this on connection 1
SELECT *
FROM dbo.PhantomReadExample
ORDER BY PhantomReadExampleID;
GO
--run this on connection 2 while the connection 1 SELECT is running
INSERT INTO dbo.PhantomReadExample(PhantomReadExampleID, PhantomReadData)
VALUES(1, 'data');
GO
Shared locks are acquired on rows as they are read during the SELECT query scan to ensure only committed data are read but these are immediately released once data are read improve concurrency. This allows other sessions to insert, update, and delete rows while the SELECT query is running.
The inserted row is not returned in this case because the ordered clustered index scan had already past the point of the insert.

Below is the wikipedia definition of phantom reads
A phantom read occurs when, in the course of a transaction, new rows
are added by another transaction to the records being read.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfill
that WHERE clause.
This is certainly possible to reproduce in a single reading query (of course other database activity must also be happening to produce the phantom rows).
Setup
CREATE TABLE Test(X INT PRIMARY KEY);
Connection 1 (leave this running)
SET NOCOUNT ON;
WHILE 1 = 1
INSERT INTO Test VALUES (CRYPT_GEN_RANDOM(4))
Connection 2
This is extremely likely to return some rows if running at read committed lock isolation level (default for the on premise product and enforced with table hint below)
WITH CTE AS
(
SELECT *
FROM Test WITH (READCOMMITTEDLOCK)
WHERE X BETWEEN 0 AND 2147483647
)
SELECT *
FROM CTE c1
FULL OUTER HASH JOIN CTE c2 ON c1.X = c2.X
WHERE (c1.X IS NULL OR c2.X IS NULL)
The returned rows are values added between the first and second read of the table for rows matching the WHERE X BETWEEN 0 AND 2147483647 predicate.

Related

UPDATE Blocking SELECT Of Unrelated Rows

I have TableA with Col1 as the primary key. I am running the following transaction without committing it (for test purposes).
BEGIN TRANSACTION
UPDATE TableA
SET Col3 = 0
WHERE Col2 = 'AAA'
In the meanwhile, I run the following query and see that it waits on the first transaction to complete.
SELECT *
FROM TableA
WHERE Col2 = 'BBB'
But the following query returns the results immediately:
SELECT *
FROM TableA
WHERE Col1 = '1'
So I thought that the second query might need to read rows that have exclusive locks put by the first transaction in order to select rows with Col2 = 'BBB'. That's why then I tried to index Col2 so that a table seek will not be necessary but that did not work either. Second query still waits on the first transaction.
What should be done to prevent SELECT from blocking (except the use of NOLOCK).
P.S: Transaction isolation level is "Read Committed".

reader writer deadlock in sql server

is there any way to avoid deadlock on an update query without changing (or add) the index?
The following query generates always a deadlock
update table1
set Batch_ID=1
where item_id in (select top 300 t1.item_id
From table1 t1 inner join table2 t2 on t1.item_id=t2.item_id
inner join table3 t3 on t1.item_ID=t3.item_ID
Where IsNull(t3.item_Delivered,0) = 0
And t1.TBatch_ID is Null
And t2.Shipper_ID = 2
And DateDiff(day,t1.TShipping_Date,getdate()) < 90
And (
DateDiff(minute,IsNull(t1.LastTrackingDate,DateAdd(day,-2,GetDate())),getdate()) > 180
OR (DateDiff(minute,IsNull(t1.LastTrackingDate,DateAdd(day,-2,GetDate())),getdate()) > 60 And IsNull(t3.item_Indelivery,0) = 1)
)
And t2.Customer_ID not in (700,800)
Order By t1.LastTrackingDate, t2.Customer_ID)
usually I use set transaction isolation level read uncommitted on select query (reader), but in this case it is an update query (writer). So I cannot apply the same reasonning (isolation level).
Is there a way to set transaction isolation level just for the subquery (just for the select) ??
Can I add WITH (NOLOCK) for each table in the select clause of the subquery ?
Thanks
The query appears to toggle the Batch_ID column from NULL to 1 on the first 300 rows which meet a certain criteria.
This update is prone to deadlocks, given that if two connections both run the same query concurrently, both queries will find overlapping table1 rows and both will try and update (there is a race condition between the rows returned from the subquery and the outer update).
Re : (NOLOCK) - no, read uncommitted will lead to even more unpredictable behaviour. One option would be to synchronize concurrent calls to the update by raising the locking pessimism such that any concurrent connection will be blocked until the first connection's batch of 300 has completed tagging, e.g.:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
update table1
set Batch_ID=1
where item_id in (select top 300 t1.item_id
From table1 t1 (WITH XLOCK) ...
SET TRANSACTION ISOLATION READ COMMITTED;

Why UPDATE blocks SELECT on unrelated rows?

Having the table, defined by script [1], I execute scripts in 2 windows of SSMS
--1) first in first SSMS window
set transaction isolation level READ UNCOMMITTED;
begin transaction;
update aaa set Name ='bbb'
where id=1;
-- results in "(1 row(s) affected)"
--rollback
and after 1)
--2)after launching 1)
select * from aaa --deleted comments
where id<>1
--is blocked
Independently on transaction isolation level in 1) window, the SELECT in 2) is blocked.
Why?
Does isolation level for UPDATE have any influence on statements on other transactions?
The highest isolation level is default READ COMMITTED in 2).
No range locks are attributed, SELECT should have suffered from COMMITTED READS (NONREPEATABLE READs) and PHANTOM READS (Repeatable Reads) problems [2]
How to make it suffer?
How can UPDATE be made without blocking SELECT?
[1]
CREATE TABLE aaa
(
Id int IDENTITY(1,1) NOT NULL,
Name varchar(13) NOT NULL
)
insert into aaa(Name)
select '111' union all
select '222' union all
select '333' union all
select '444' union all
select '555' union all
select '666' union all
select '777' union all
select '888'
[2]
Copy&paste or add trailing ) upon clicking
http://en.wikipedia.org/wiki/Isolation_(database_systems)
Update:
SELECT WITH(NOLOCK) is not blocked...
Update2:
or with, what is the same, READ UNCOMMITTED
Note that UPDATE is on different from SELECT row.
Even, if on the same, this behavior contradicts to description of isolation levels [2]
The points are that:
suppose I cannot know who else is going to SELECT from the same (UPDATE-d) table but on unrelated to update rows
to understand isolation levels [2]
SQL Server 2008 R2 Dev
I believe it's because you don't have a primary key, which I think is resulting in the locks being escalated, hence blocking out the SELECT. If you add a PRIMARY KEY onto the ID column, you will notice that if you try again, the SELECT will return the other 3 rows now - no WITH (NOLOCK) hint needed.
Repeating tests after
--3)
create index IX_aaa_ID on aaa(id)
SELECT 2) is still blocked
--4)
drop index IX_aaa_ID on aaa
create unique index IX_aaa_ID on aaa(id)
--or adding primary key constraint
SELECT 2) is NOT blocked
If to modify 2) as
--2b)
select * from aaa
where id=3
--or as
--WHERE id=2
shows that 2b) is not blocked even in absence of any index or PK.
Though, 2b), without any indexes, is blocked after modifying 1) UPDATE to run under serializable
but not under REPEATABLE READ or lower
--1c)
set transaction isolation level serializable;
--set transaction isolation level REPEATABLE READ;
begin transaction;
update aaa set Name ='bbb'
where id=1;
--rollback
So, it looks like multiple row selection attempts to acquire non-shareable lock?
Update:
Well, in all cases of SELECT being blocked it is waiting to acquire LCK_M_IS
Good reason to uderstand this cuisine
Update2:
Well, it is not UPDATE lock that is escalated on the table, it is SELECT (shared) locks (when SELECT tries to read multiple rows) are escalated to a table lock and cannot be granted because table has already exclusive (UPDATE) lock.
And presence or absence of index was unrelated to my primary question
I shift the discussion of this topic to my submitted suggestion "Intent rowlocks should not be escalated to a table lock if a table already contains exclusive lock"

SQL - Inserting and Updating Multiple Records at Once

I have a stored procedure that is responsible for inserting or updating multiple records at once. I want to perform this in my stored procedure for the sake of performance.
This stored procedure takes in a comma-delimited list of permit IDs and a status. The permit IDs are stored in a variable called #PermitIDs. The status is stored in a variable called #Status. I have a user-defined function that converts this comma-delimited list of permit IDs into a Table. I need to go through each of these IDs and do either an insert or update into a table called PermitStatus.
If a record with the permit ID does not exist, I want to add a record. If it does exist, I'm want to update the record with the given #Status value. I know how to do this for a single ID, but I do not know how to do it for multiple IDs. For single IDs, I do the following:
-- Determine whether to add or edit the PermitStatus
DECLARE #count int
SET #count = (SELECT Count(ID) FROM PermitStatus WHERE [PermitID]=#PermitID)
-- If no records were found, insert the record, otherwise add
IF #count = 0
BEGIN
INSERT INTO
PermitStatus
(
[PermitID],
[UpdatedOn],
[Status]
)
VALUES
(
#PermitID,
GETUTCDATE(),
1
)
END
ELSE
UPDATE
PermitStatus
SET
[UpdatedOn]=GETUTCDATE(),
[Status]=#Status
WHERE
[PermitID]=#PermitID
How do I loop through the records in the Table returned by my user-defined function to dynamically insert or update the records as needed?
create a split function, and use it like:
SELECT
*
FROM YourTable y
INNER JOIN dbo.splitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTableAll]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this WILL return empty rows
----------------
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTableAll(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
RowNumber ListValue
----------- ----------
1 1
2 2
3 3
4
5
6 4
7 5
8 6777
9
10
11
(11 row(s) affected)
To make what you need work, do the following:
--this would be the existing table
DECLARE #OldData table (RowID int, RowStatus char(1))
INSERT INTO #OldData VALUES (10,'z')
INSERT INTO #OldData VALUES (20,'z')
INSERT INTO #OldData VALUES (30,'z')
INSERT INTO #OldData VALUES (70,'z')
INSERT INTO #OldData VALUES (80,'z')
INSERT INTO #OldData VALUES (90,'z')
--these would be the stored procedure input parameters
DECLARE #IDList varchar(500)
,#StatusList varchar(500)
SELECT #IDList='10,20,30,40,50,60'
,#StatusList='A,B,C,D,E,F'
--stored procedure local variable
DECLARE #InputList table (RowID int, RowStatus char(1))
--convert input prameters into a table
INSERT INTO #InputList
(RowID,RowStatus)
SELECT
i.ListValue,s.ListValue
FROM dbo.FN_ListToTableAll(',',#IDList) i
INNER JOIN dbo.FN_ListToTableAll(',',#StatusList) s ON i.RowNumber=s.RowNumber
--update all old existing rows
UPDATE o
SET RowStatus=i.RowStatus
FROM #OldData o WITH (UPDLOCK, HOLDLOCK) --to avoid race condition when there is high concurrency as per #emtucifor
INNER JOIN #InputList i ON o.RowID=i.RowID
--insert only the new rows
INSERT INTO #OldData
(RowID, RowStatus)
SELECT
i.RowID, i.RowStatus
FROM #InputList i
LEFT OUTER JOIN #OldData o ON i.RowID=o.RowID
WHERE o.RowID IS NULL
--display the old table
SELECT * FROM #OldData order BY RowID
OUTPUT:
RowID RowStatus
----------- ---------
10 A
20 B
30 C
40 D
50 E
60 F
70 z
80 z
90 z
(9 row(s) affected)
EDIT thanks to #Emtucifor click here for the tip about the race condition, I have included the locking hints in my answer, to prevent race condition problems when there is high concurrency.
There are various methods to accomplish the parts you ask are asking about.
Passing Values
There are dozens of ways to do this. Here are a few ideas to get you started:
Pass in a string of identifiers and parse it into a table, then join.
SQL 2008: Join to a table-valued parameter
Expect data to exist in a predefined temp table and join to it
Use a session-keyed permanent table
Put the code in a trigger and join to the INSERTED and DELETED tables in it.
Erland Sommarskog provides a wonderful comprehensive discussion of lists in sql server. In my opinion, the table-valued parameter in SQL 2008 is the most elegant solution for this.
Upsert/Merge
Perform a separate UPDATE and INSERT (two queries, one for each set, not row-by-row).
SQL 2008: MERGE.
An Important Gotcha
However, one thing that no one else has mentioned is that almost all upsert code, including SQL 2008 MERGE, suffers from race condition problems when there is high concurrency. Unless you use HOLDLOCK and other locking hints depending on what's being done, you will eventually run into conflicts. So you either need to lock, or respond to errors appropriately (some systems with huge transactions per second have used the error-response method successfully, instead of using locks).
One thing to realize is that different combinations of lock hints implicitly change the transaction isolation level, which affects what type of locks are acquired. This changes everything: which other locks are granted (such as a simple read), the timing of when a lock is escalated to update from update intent, and so on.
I strongly encourage you to read more detail on these race condition problems. You need to get this right.
Conditional Insert/Update Race Condition
“UPSERT” Race Condition With MERGE
Example Code
CREATE PROCEDURE dbo.PermitStatusUpdate
#PermitIDs varchar(8000), -- or (max)
#Status int
AS
SET NOCOUNT, XACT_ABORT ON -- see note below
BEGIN TRAN
DECLARE #Permits TABLE (
PermitID int NOT NULL PRIMARY KEY CLUSTERED
)
INSERT #Permits
SELECT Value FROM dbo.Split(#PermitIDs) -- split function of your choice
UPDATE S
SET
UpdatedOn = GETUTCDATE(),
Status = #Status
FROM
PermitStatus S WITH (UPDLOCK, HOLDLOCK)
INNER JOIN #Permits P ON S.PermitID = P.PermitID
INSERT PermitStatus (
PermitID,
UpdatedOn,
Status
)
SELECT
P.PermitID,
GetUTCDate(),
#Status
FROM #Permits P
WHERE NOT EXISTS (
SELECT 1
FROM PermitStatus S
WHERE P.PermitID = S.PermitID
)
COMMIT TRAN
RETURN ##ERROR;
Note: XACT_ABORT helps guarantee the explicit transaction is closed following a timeout or unexpected error.
To confirm that this handles the locking problem, open several query windows and execute an identical batch like so:
WAITFOR TIME '11:00:00' -- use a time in the near future
EXEC dbo.PermitStatusUpdate #PermitIDs = '123,124,125,126', 1
All of these different sessions will execute the stored procedure in nearly the same instant. Check each session for errors. If none exist, try the same test a few times more (since it's possible to not always have the race condition occur, especially with MERGE).
The writeups at the links I gave above give even more detail than I did here, and also describe what to do for the SQL 2008 MERGE statement as well. Please read those thoroughly to truly understand the issue.
Briefly, with MERGE, no explicit transaction is needed, but you do need to use SET XACT_ABORT ON and use a locking hint:
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Table WITH (HOLDLOCK) AS TableAlias
...
This will prevent concurrency race conditions causing errors.
I also recommend that you do error handling after each data modification statement.
If you're using SQL Server 2008, you can use table valued parameters - you pass in a table of records into a stored procedure and then you can do a MERGE.
Passing in a table valued parameter would remove the need to parse CSV strings.
Edit:
ErikE has raised the point about race conditions, please refer to his answer and linked articles.
If you have SQL Server 2008, you can use MERGE. Here's an article describing this.
You should be able to do your insert and your update as two set based queries.
The code below was based on a data load procedure that I wrote a while ago that took data from a staging table and inserted or updated it into the main table.
I've tried to make it match your example, but you may need to tweak this (and create a table valued UDF to parse your CSV into a table of ids).
-- Update where the join on permitstatus matches
Update
PermitStatus
Set
[UpdatedOn]=GETUTCDATE(),
[Status]=staging.Status
From
PermitStatus status
Join
StagingTable staging
On
staging.PermitId = status.PermitId
-- Insert the new records, based on the Where Not Exists
Insert
PermitStatus(Updatedon, Status, PermitId)
Select (GETUTCDATE(), staging.status, staging.permitId
From
StagingTable staging
Where Not Exists
(
Select 1 from PermitStatus status
Where status.PermitId = staging.PermidId
)
Essentially you have an upsert stored procedure (eg. UpsertSinglePermit)
(like the code you have given above) for dealing with one row.
So the steps I see are to create a new stored procedure (UpsertNPermits) which does
a) Parse input string into n record entries (each record contains permit id and status)
b) Foreach entry in above, invoke UpsertSinglePermit

In tsql is an Insert with a Select statement safe in terms of concurrency?

In my answer to this SO question I suggest using a single insert statement, with a select that increments a value, as shown below.
Insert Into VersionTable
(Id, VersionNumber, Title, Description, ...)
Select #ObjectId, max(VersionNumber) + 1, #Title, #Description
From VersionTable
Where Id = #ObjectId
I suggested this because I believe that this statement is safe in terms of concurrency, in that if another insert for the same object id is run at the same time, there is no chance of having duplicate version numbers.
Am I correct?
As Paul writes: No, it's not safe, for which I would like to add empirical evidence: Create a table Table_1 with one field ID and one record with value 0. Then execute the following code simultaneously in two Management Studio query windows:
declare #counter int
set #counter = 0
while #counter < 1000
begin
set #counter = #counter + 1
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
end
Then execute
SELECT ID, COUNT(*) FROM Table_1 GROUP BY ID HAVING COUNT(*) > 1
On my SQL Server 2008, one ID (662) was created twice. Thus, the default isolation level applied to single statements is not sufficient.
EDIT: Clearly, wrapping the INSERT with BEGIN TRANSACTION and COMMIT won't fix it, since the default isolation level for transactions is still READ COMMITTED, which is not sufficient. Note that setting the transaction isolation level to REPEATABLE READ is also not sufficient. The only way to make the above code safe is to add
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
at the top. This, however, caused deadlocks every now and then in my tests.
EDIT: The only solution I found which is safe and does not produce deadlocks (at least in my tests) is to explicitly lock the table exclusively (default transaction isolation level is sufficient here). Beware though; this solution might kill performance:
...loop stuff...
BEGIN TRANSACTION
SELECT * FROM Table_1 WITH (TABLOCKX, HOLDLOCK) WHERE 1=0
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
COMMIT
...loop end...
The default isolation of read commited makes this unsafe, if two of these run in perfect paralel you will get a duplicate since there is no read lock applied.
You need REPEATABLE READ or SERIALIZABLE isolation levels to make it safe.
I think you're assumption is incorrect. When you query the VersionNumber table, you are only putting a read lock on the row. This does not prevent other users from reading the same row from the same table. Therefore, it is possible for two processes to read the same row in the VersionNumber table at the same time and generate the same VersionNumber value.
You need a unique constraint on (Id, VersionNumber) to enforce it
I'd use ROWLOCK, XLOCK hints to block other folk reading the locked row where you calculate
or wrap the INSERT in a TRY/CATCH. If I get a duplicate, try again...

Resources