SQL Server READPAST hint - sql-server

I'm seeing behavior which looks like the READPAST hint is set on the database itself.
The rub: I don't think this is possible.
We have table foo (id int primary key identity, name varchar(50) not null unique);
I have several threads which do, basically
id = select id from foo where name = ?
if id == null
insert into foo (name) values (?)
id = select id from foo where name = ?
Each thread is responsible for inserting its own name (no two threads try to insert the same name at the same time). Client is java.
READ_COMMITTED_SNAPSHOT is ON, transaction isolation is specifically set to READ COMMITTED, using Connection.setTransactionIsolation( Connection.TRANSACTION_READ_COMMITTED );
Symptom is that if one thread is inserting, the other thread can't see it's row -- even rows which were committed to the database before the application started -- and tries to insert, but gets a duplicate-key-exception from the unique index on name.
Throw me a bone here?

You're at the wrong isolation level. Remember what happens with the snapshot isolation level. If one transaction is making a change, no other concurrent transactions see that transaction. Period. Other transactions only will see your changes once you have committed, but only if they START after your commit. The solution to this is to use a different isolation level. Wrap your statements in a transaction and SET TRANSACTION LEVEL SERIALIZABLE. This will ensure that your other concurrent transactions work as if they were all run serially, which is what you seem to want here.

Sounds like you're not wrapping the select and insert into a transaction?
As a solution, you could:
insert into foo (col1,col2,col3) values ('a','b','c')
where not exists (select * from foo where col1 = 'a')
After this, ##rowcount will be 1 if can check if a row was inserted.

SELECT SCOPE_IDENTITY()
should do the trick here...
plus wrapping into a transaction like previous poster mentioned.

The moral of this story is fully explained in my blog post "You can't hold onto nothing" but the short version of this is that you want to use the HOLDLOCK hint. I use the pattern:
INSERT INTO dbo.Foo(Name)
SELECT TOP 1
#name AS Name
FROM (SELECT 1 AS FakeColumn) AS FakeTable
WHERE NOT EXISTS (SELECT * FROM dbo.Foo WITH (HOLDLOCK)
WHERE Name=#name)
SELECT ID FROM dbo.Foo WHERE Name=#name

Related

What is the best practice for inserting a record if it doesn't already exist?

I know at least three ways to insert a record if it doesn't already exist in a table:
The first one is using if not exist:
IF NOT EXISTS(select 1 from table where <condition>)
INSERT...VALUES
The second one is using merge:
MERGE table AS target
USING (SELECT values) AS source
ON (condition)
WHEN NOT MATCHED THEN
INSERT ... VALUES ...
The third one is using insert...select:
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>)
But which one is the best?
The first option seems to be not thread-safe, as the record might be inserted between the select statement in the if and the insert statement that follows, if two or more users try to insert the same record.
As for the second option, merge seems to be an overkill for this, as the documentation states:
Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating the row if it does match. When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.
So I think the third option is the best for this scenario (only insert the record if it doesn't already exist, no need to update if it does), but I would like to know what SQL Server experts think.
Please note that after the insert, I'm not interested to know whether the record was already there or whether it's a brand new record, I just need it to be there so that I can carry on with the rest of the stored procedure.
When you need to guarantee the uniqueness of records on a condition that can not to be expressed by a UNIQUE or PRIMARY KEY constraint, you indeed need to make sure that the check for existence and insert are being done in one transaction. You can achieve this by either:
Using one SQL statement performing the check and the insert (your third option)
Using a transaction with the appropriate isolation level
There is a fourth way though that will help you better structure your code and also make it work in situations where you need to process a batch of records at once. You can create a TABLE variable or a temporary table, insert all of the records that need to be inserted in there and then write the INSERT, UPDATE and DELETE statements based on this variable.
Below is (pseudo)code demonstrating this approach:
-- Logic to create the data to be inserted if necessary
DECLARE #toInsert TABLE (idCol INT PRIMARY KEY,dataCol VARCHAR(MAX))
INSERT INTO #toInsert (idCol,dataCol) VALUES (1,'row 1'),(2,'row 2'),(3,'row 3')
-- Logic to insert the data
INSERT INTO realTable (idCol,dataCol)
SELECT TI.*
FROM #toInsert TI
WHERE NOT EXISTS (SELECT 1 FROM realTable RT WHERE RT.dataCol=TI.dataCol)
In many situations I use this approach as it makes the TSQL code easier to read, possible to refactor and apply unit tests to.
Following Vladimir Baranov's comment, reading Dan Guzman's blog posts about Conditional INSERT/UPDATE Race Condition and “UPSERT” Race Condition With MERGE, seems like all three options suffers from the same drawbacks in a multi-user environment.
Eliminating the merge option as an overkill, we are left with options 1 and 3.
Dan's proposed solution is to use an explicit transaction and add lock hints to the select to avoid race condition.
This way, option 1 becomes:
BEGIN TRANSACTION
IF NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK) where <condition>)
BEGIN
INSERT...VALUES
END
COMMIT TRANSACTION
and option 2 becomes:
BEGIN TRANSACTION
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK)where <condition>)
COMMIT TRANSACTION
Of course, in both options there need to be some error handling - every transaction should use a try...catch so that we can rollback the transaction in case of an error.
That being said, I think the 3rd option is probably my personal favorite, but I don't think there should be a difference.
Update
Following a conversation I've had with Aaron Bertrand in the comments of some other question - I'm not entirely convinced that using ISOLATION LEVEL is a better solution than individual query hints, but at least that's another option to consider:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>);
COMMIT TRANSACTION;

Avoid inserting duplicate records in SQL Server

I haven't been able to find an answer to this. Suppose I have the following table/query:
The table:
create table ##table
(
column1 int,
column2 nvarchar(max)
)
The query (in a real life scenario the condition will be more complex):
declare #shouldInsert bit
set #shouldInsert = case when exists(
select *
from ##table
where column2 = 'test') then 1 else 0 end
--Exaggerating a possible delay:
waitfor delay '00:00:10'
if(#shouldInsert = 0)
insert into ##table
values(1, 'test')
If I run this query twice simultaneously then it's liable to insert duplicate records (enforsing a unique constraint is out of the question because the real-life condition is more involved than the mere "column1" uniqueness across the table)
I see two possible solutions:
I run both concurrent transactions in serializable mode, but it will create a deadlock (first a shared lock in select then an x-lock in insert - deadlock).
In the select statement I use the query hints with(update, tablock) which will effectively x-lock the entire table, but it will prevent other transactions from reading data (something I'd like to avoid)
Which is more acceptable? Is there a third solution?
Thanks.
If you can, you should put a UNIQUE constraint (or index) on whatever column(s) it is that is defining the uniqueness.
With this, you might still get the "OK, doesn't exist yet" response for your initial check for two separate processes - but one of the two will be first and get his row inserted, while the second will get a "unique constraint violated" exception back from the database.
Regardless how "involved" your "real-life condition" is you have two options: enforce UNIQUE or deal with multiple records. Any work-around will likely be fragile.
For example your delay hack is pretty useless if you need to add another DB server or overwhelming load slows down the execution of individual threads
One of the ways you could allow for multiple copies of a should-be-unique value is to create another table that can act as a queue and doesn't enforce uniqueness and a serial worker to dequeue it. Or change the data structure to allow for 1-to-many and pick the first one when querying. Still a hack but at least not terribly "creative" and it can't break
declare #shouldInsert bit
set #shouldInsert = case when exists(
select *
from ##table
where column2 = 'test') then 1 else 0 end
--Exaggerating a possible delay:
waitfor delay '00:00:10'
truncate table #temp
if(#shouldInsert = 0)
insert into #temp
values(1, 'test')
--if records is not available in ##table then data will be inserted from #temp table to ##table
insert into ##table
select * from #temp
except
select * from ##table

Select and Delete in the same transaction using TOP clause

I have table in which the data is been continuously added at a rapid pace.
And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.
I am using the below query to
BEGIN TRAN readrowdata
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
WITH q AS
(
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
)
DELETE from q
COMMIT TRANSACTION readrowdata
I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.
Is the query is ok in performance term? If i can improve it then please provide me your suggestion.
Thank you
Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...
DECLARE #rowsBeingProcessed TABLE (
id INT
);
WITH rows AS (
SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO #rowsBeingProcessed;
-- Business Logic Here
DELETE yourTable WHERE RowDataID IN (SELECT id FROM #rowsBeingProcessed);
Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.
By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.
If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).
Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!
You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...
Data Definition language
CREATE TABLE SampleTable
(
id int
)
Sample Record
insert into SampleTable(id)Values(1)
Sample Trigger
CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
Set NOCOUNT ON
SET XACT_ABORT ON
Begin Try
Begin Tran
Select ID From Inserted
DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);
Commit Tran
End Try
Begin Catch
Rollback Tran
End Catch
End
Hope this is very simple and helpful
If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?
If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?
I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:
DECLARE #Output table
(
StaffID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO #Output
FROM Sales.vSalesPerson sp
INNER JOIN dbo.SalesStaff ss
ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM #output;
Maybe it will be helpfull for you.

Why UPDATE blocks SELECT on unrelated rows?

Having the table, defined by script [1], I execute scripts in 2 windows of SSMS
--1) first in first SSMS window
set transaction isolation level READ UNCOMMITTED;
begin transaction;
update aaa set Name ='bbb'
where id=1;
-- results in "(1 row(s) affected)"
--rollback
and after 1)
--2)after launching 1)
select * from aaa --deleted comments
where id<>1
--is blocked
Independently on transaction isolation level in 1) window, the SELECT in 2) is blocked.
Why?
Does isolation level for UPDATE have any influence on statements on other transactions?
The highest isolation level is default READ COMMITTED in 2).
No range locks are attributed, SELECT should have suffered from COMMITTED READS (NONREPEATABLE READs) and PHANTOM READS (Repeatable Reads) problems [2]
How to make it suffer?
How can UPDATE be made without blocking SELECT?
[1]
CREATE TABLE aaa
(
Id int IDENTITY(1,1) NOT NULL,
Name varchar(13) NOT NULL
)
insert into aaa(Name)
select '111' union all
select '222' union all
select '333' union all
select '444' union all
select '555' union all
select '666' union all
select '777' union all
select '888'
[2]
Copy&paste or add trailing ) upon clicking
http://en.wikipedia.org/wiki/Isolation_(database_systems)
Update:
SELECT WITH(NOLOCK) is not blocked...
Update2:
or with, what is the same, READ UNCOMMITTED
Note that UPDATE is on different from SELECT row.
Even, if on the same, this behavior contradicts to description of isolation levels [2]
The points are that:
suppose I cannot know who else is going to SELECT from the same (UPDATE-d) table but on unrelated to update rows
to understand isolation levels [2]
SQL Server 2008 R2 Dev
I believe it's because you don't have a primary key, which I think is resulting in the locks being escalated, hence blocking out the SELECT. If you add a PRIMARY KEY onto the ID column, you will notice that if you try again, the SELECT will return the other 3 rows now - no WITH (NOLOCK) hint needed.
Repeating tests after
--3)
create index IX_aaa_ID on aaa(id)
SELECT 2) is still blocked
--4)
drop index IX_aaa_ID on aaa
create unique index IX_aaa_ID on aaa(id)
--or adding primary key constraint
SELECT 2) is NOT blocked
If to modify 2) as
--2b)
select * from aaa
where id=3
--or as
--WHERE id=2
shows that 2b) is not blocked even in absence of any index or PK.
Though, 2b), without any indexes, is blocked after modifying 1) UPDATE to run under serializable
but not under REPEATABLE READ or lower
--1c)
set transaction isolation level serializable;
--set transaction isolation level REPEATABLE READ;
begin transaction;
update aaa set Name ='bbb'
where id=1;
--rollback
So, it looks like multiple row selection attempts to acquire non-shareable lock?
Update:
Well, in all cases of SELECT being blocked it is waiting to acquire LCK_M_IS
Good reason to uderstand this cuisine
Update2:
Well, it is not UPDATE lock that is escalated on the table, it is SELECT (shared) locks (when SELECT tries to read multiple rows) are escalated to a table lock and cannot be granted because table has already exclusive (UPDATE) lock.
And presence or absence of index was unrelated to my primary question
I shift the discussion of this topic to my submitted suggestion "Intent rowlocks should not be escalated to a table lock if a table already contains exclusive lock"

In tsql is an Insert with a Select statement safe in terms of concurrency?

In my answer to this SO question I suggest using a single insert statement, with a select that increments a value, as shown below.
Insert Into VersionTable
(Id, VersionNumber, Title, Description, ...)
Select #ObjectId, max(VersionNumber) + 1, #Title, #Description
From VersionTable
Where Id = #ObjectId
I suggested this because I believe that this statement is safe in terms of concurrency, in that if another insert for the same object id is run at the same time, there is no chance of having duplicate version numbers.
Am I correct?
As Paul writes: No, it's not safe, for which I would like to add empirical evidence: Create a table Table_1 with one field ID and one record with value 0. Then execute the following code simultaneously in two Management Studio query windows:
declare #counter int
set #counter = 0
while #counter < 1000
begin
set #counter = #counter + 1
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
end
Then execute
SELECT ID, COUNT(*) FROM Table_1 GROUP BY ID HAVING COUNT(*) > 1
On my SQL Server 2008, one ID (662) was created twice. Thus, the default isolation level applied to single statements is not sufficient.
EDIT: Clearly, wrapping the INSERT with BEGIN TRANSACTION and COMMIT won't fix it, since the default isolation level for transactions is still READ COMMITTED, which is not sufficient. Note that setting the transaction isolation level to REPEATABLE READ is also not sufficient. The only way to make the above code safe is to add
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
at the top. This, however, caused deadlocks every now and then in my tests.
EDIT: The only solution I found which is safe and does not produce deadlocks (at least in my tests) is to explicitly lock the table exclusively (default transaction isolation level is sufficient here). Beware though; this solution might kill performance:
...loop stuff...
BEGIN TRANSACTION
SELECT * FROM Table_1 WITH (TABLOCKX, HOLDLOCK) WHERE 1=0
INSERT INTO Table_1
SELECT MAX(ID) + 1 FROM Table_1
COMMIT
...loop end...
The default isolation of read commited makes this unsafe, if two of these run in perfect paralel you will get a duplicate since there is no read lock applied.
You need REPEATABLE READ or SERIALIZABLE isolation levels to make it safe.
I think you're assumption is incorrect. When you query the VersionNumber table, you are only putting a read lock on the row. This does not prevent other users from reading the same row from the same table. Therefore, it is possible for two processes to read the same row in the VersionNumber table at the same time and generate the same VersionNumber value.
You need a unique constraint on (Id, VersionNumber) to enforce it
I'd use ROWLOCK, XLOCK hints to block other folk reading the locked row where you calculate
or wrap the INSERT in a TRY/CATCH. If I get a duplicate, try again...

Resources