Primary Key violation after checking for existence of row - sql-server

Question: What things could cause the following problem?
We have a stored proc that updates or inserts a price record into a table. Very straight forward. However, at intermittent times we get a primary key violation trying to insert a row that already exists. As you see, we are checking for the existence of a record and then updating or inserting as appropriate.
This happens very rarely but the last time it happened we had 18 occurrences in a 3 minute window. We updated a lot more rows than 18 so it is not every time. This SP is called quite often. We check and there was no index maintenance going on at the time. The application that calls this SP just loops through a queue to update/insert these prices and there is only one instance of the application running.
This is running on an 2016 Availability Group with 3 servers.
ALTER PROCEDURE [dbo].[mw_UpdatePrice] #CustTypeID INT
,#ID INT
,#Price MONEY
,#OldPrice MONEY
,#ExpirationDate DATETIME
,#PriceStatusID INT
,#PriceStatusDesc VARCHAR(80)
,#FreeFreightShipviaServiceLevelID INT = NULL
,#FreeFreightShipViaServiceLevelDescription VARCHAR(150) = NULL
,#FreeFreightShipViaServiceLevelRank INT = NULL
AS
BEGIN
SET NOCOUNT ON;
IF EXISTS (
SELECT 1
FROM dbo.Price
WHERE ID = #ID
AND CustTypeID = #CustTypeID
)
BEGIN
UPDATE dbo.Price
SET Price = #Price
,OldPrice = #OldPrice
,ExpirationDate = #ExpirationDate
,PriceStatusID = #PriceStatusID
,PriceStatusDescription = #PriceStatusDesc
,ServiceLevelName = #FreeFreightShipViaServiceLevelDescription
,ServiceLevelId = #FreeFreightShipviaServiceLevelID
,ServiceLevelRank = #FreeFreightShipViaServiceLevelRank
WHERE ID = #ID
AND CustTypeID = #CustTypeID
END
ELSE
BEGIN
INSERT dbo.Price (
ID
,CustTypeID
,Price
,OldPrice
,ExpirationDate
,PriceStatusID
,PriceStatusDescription
,ServiceLevelName
,ServiceLevelID
,ServiceLevelRank
)
VALUES (
#ID
,#CustTypeID
,#Price
,#OldPrice
,#ExpirationDate
,#PriceStatusID
,#PriceStatusDesc
,#FreeFreightShipViaServiceLevelDescription
,#FreeFreightShipviaServiceLevelID
,#FreeFreightShipViaServiceLevelRank
)
END
END

As comments have mentioned, this stored procedure is not safe to call concurrently - it has race conditions for both insert and update scenarios. This can be fixed only by ensuring that each call to the stored procedure is contained in a separate transaction (add begin/end tran inside the SP, or initiate transaction from the application code), and applying HOLDLOCK to the SELECT and INSERT statements:
SELECT 1
FROM dbo.Price WITH (UPDLOCK, HOLDLOCK)
WHERE ID = #ID
AND CustTypeID = #CustTypeID
...
INSERT dbo.Price WITH (HOLDLOCK)
...
Even if this gets refactored to a MERGE statement, HOLDLOCK will still be necessary to prevent the issue. Example here.
HOLDLOCK locks the entire table and will reduce the concurrent throughput of this SP.
Aside from the concurrency problem, another possible cause of PRIMARY KEY violation could occur if ANSI_NULLS is on, potentially allowing for an application logic error where the SP is being called with NULL values for #CustTypeID or #ID, causing the application to behave in a way that wasn't intended. For example, calling with #CustTypeID = NULL and #ID = 1 would always result in an INSERT, which might not be the intended behavior.
Edit:
Added UPDLOCK as per #DavidBrowne

Related

Row Locking for update

Can anyone confirm this for me. I need to be able to write to a field in a row an 'ownership' value (Who own's the record) and need it to be the first person who selects the row for update and ignore any further selects until the row is available to write to....
My Transaction will be:
BEGIN TRANSACTION
Declare #OwnerField Varchar(20)
SET #OwnerField = SELECT OwnerField
FROM Table
WHERE RecordID = 2
IF #OwnerField IS NULL -- Can own
BEGIN
UPDATE Table
SET OwnerField = 'John Smith'
WHERE RecordID = 2
END
END TRANSACTION
As far as my knowledge goes (with Google's help) this will allow me to lock the row, check if there is a value in it, if not then write one, if so then exit..
Does this make sense?
Thank you in advance..
Derek.
Unless you want to handle the contention by producing deadlocks, don't use SERIALIZABLE for this. SERIALIZABLE will take and hold Shared (S) locks in the first query, so concurrent transactions will both read the row, and enter into a deadlock as they both try to update it. One will be killed; the other will succeed, and the SERIALIZABLE semantics are preserved.
Instead you should put a restrictive lock on the target row as you read it.
eg:
BEGIN TRANSACTION
Declare #OwnerField Varchar(20)
SET #OwnerField = SELECT OwnerField
FROM Table with (UPDLOCK,HOLDLOCK)
WHERE RecordID = 2
IF #OwnerField IS NULL -- Can own
BEGIN
UPDATE Table
SET OwnerField = 'John Smith'
WHERE RecordID = 2
END
END TRANSACTION
(UPDLOCK,HOLDLOCK) gives you the same range-locking protection of the SERIALIZABLE isolation level, but uses a restrictive lock, so multiple transactions will block on the SELECT. The second reader will block until the first has committed, and see the updated OwnerField column.

Generating Unique Random Numbers Efficiently

We are using the technique outlined here to generate random record IDs without collisions. In short, we create a randomly-ordered table of every possible ID, and mark each record as 'Taken' as it is used.
I use the following Stored Procedure to obtain an ID:
ALTER PROCEDURE spc_GetId #retVal BIGINT OUTPUT
AS
DECLARE #curUpdate TABLE (Id BIGINT);
SET NOCOUNT ON;
UPDATE IdMasterList SET Taken=1
OUTPUT DELETED.Id INTO #curUpdate
WHERE ID=(SELECT TOP 1 ID FROM IdMasterList WITH (INDEX(IX_Taken)) WHERE Taken IS NULL ORDER BY SeqNo);
SELECT TOP 1 #retVal=Id FROM #curUpdate;
RETURN;
The retrieval of the ID must be an atomic operation, as simultaneous inserts are possible.
For large inserts (10+ million), the process is quite slow, as I must pass through the table to be inserted via a cursor.
The IdMasterList has a schema:
SeqNo (BIGINT, NOT NULL) (PK) -- sequence of ordered numbers
Id (BIGINT) -- sequence of random numbers
Taken (BIT, NULL) -- 1 if taken, NULL if not
The IX_Taken index is:
CREATE NONCLUSTERED INDEX (IX_Taken) ON IdMasterList (Taken ASC)
I generally populate a table with Ids in this manner:
DECLARE #recNo BIGINT;
DECLARE #newId BIGINT;
DECLARE newAdds CURSOR FOR SELECT recNo FROM Adds
OPEN newAdds;
FETCH NEXT FROM newAdds INTO #recNo;
WHILE ##FETCH_STATUS=0 BEGIN
EXEC spc_GetId #newId OUTPUT;
UPDATE Adds SET id=#newId WHERE recNo=#recNo;
FETCH NEXT FROM newAdds INTO #id;
END;
CLOSE newAdds;
DEALLOCATE newAdds;
Questions:
Is there any way I can improve the SP to extract Ids faster?
Would a conditional index improve peformance (I've yet to test, as
IdMasterList is very big)?
Is there a better way to populate a table with these Ids?
As with most things in SQL Server, if you are using cursors, you are doing it wrong.
Since you are using SQL Server 2012, you can use a SEQUENCE to keep track of what random value you already used and effectively replace the Taken column.
CREATE SEQUENCE SeqNoSequence
AS bigint
START WITH 1 -- Start with the first SeqNo that is not taken yet
CACHE 1000; -- Increase the cache size if you regularly need large blocks
Usage:
CREATE TABLE #tmp
(
recNo bigint,
SeqNo bigint
)
INSERT INTO #tmp (recNo, SeqNo)
SELECT recNo,
NEXT VALUE FOR SeqNoSequence
FROM Adds
UPDATE Adds
SET id = m.id
FROM Adds a
INNER JOIN #tmp tmp ON a.recNo = tmp.recNo
INNER JOIN IdMasterList m ON tmp.SeqNo = m.SeqNo
SEQUENCE is atomic. Subsequent calls to NEXT VALUE FOR SeqNoSequence are guaranteed to return unique values, even for parallel processes. Note that there can be gaps in SeqNo, but it's a very small trade off for the huge speed increase.
Put a PK inden of BigInt on each table
insert into user (name)
values ().....
update user set = user.ID = id.ID
from id
left join usr
on usr.PK = id.PK
where user.ID = null;
one
insert into user (name) value ("justsaynotocursor");
set #PK = select select SCOPE_IDENTITY();
update user set ID = (select ID from id where PK = #PK);
Few ideas that came to my mind:
Try if removing the top, inner select etc. helps to improve the performance of the ID fetching (look at statistics io & query plan):
UPDATE top(1) IdMasterList
SET #retVal = Id, Taken=1
WHERE Taken IS NULL
Change the index to be a filtered index, since I assume you don't need to fetch numbers that are taken. If I remember correctly, you can't do this for NULL values, so you would need to change the Taken to be 0/1.
What actually is your problem? Fetching single IDs or 10+ million IDs? Is the problem CPU / I/O etc. caused by the cursor & ID fetching logic, or are the parallel processes being blocked by other processes?
Use sequence object to get the SeqNo. and then fetch the Id from idMasterList using the value returned by it. This could work if you don't have gaps in IdMasterList sequences.
Using READPAST hint could help in blocking, for CPU / I/O issues, you should try to optimize the SQL.
If the cause is purely the table being a hotspot, and no other easy solutions seem to help, split it into several tables and use some kind of simple logic (even ##spid, rand() or something similar) to decide from which table the ID should be fetched. You would need more checking if all tables have free numbers, but it shouldn't be that bad.
Create different procedures (or even tables) to handle fetching of single ID, hundreds of IDs and millions of IDs.

SQL Server, double insert at the exact same time, unicity Bug

I am facing trouble when the following code is called two times almost at the same time.
DECLARE #membershipIdReturn as uniqueidentifier=null
SELECT #membershipIdReturn = MembershipId
FROM [Loyalty].[Membership]
WITH (NOLOCK)
WHERE ContactId = #customerIdFront
AND
IsDeleted = 0
IF (#membershipIdReturn IS NULL)
//InsertStatementHere
The calls are so close (about 3 thousandth of a second), that the second call also enter inside the if statement. Then an unicity failure is lift because this is not supposed to happen.
Is the bug because of the (NOLOCK)? I need it for transaction issues.
Is there any workaround for correcting this behavior ?
Thanks Al
Two options
1.Use unique constraint then put your insert statement in Try Catch block
ALTER TABLE [Loyalty].[Membership]
ADD CONSTRAINT uc_ContactId_IsDeleted UNIQUE(ContactId, IsDeleted)
2.Use Merge with serializable hint. Therefore, there will be no gap between select and insert.
MERGE [Loyalty].[Membership] WITH (SERIALIZABLE) as T
USING [Loyalty].[Membership] as S
ON ContactId = #customerIdFront
AND IsDeleted = 0
WHEN NOT MATCHED THEN
INSERT (MemberName, MemberTel) values ('','');

SQL Server select for update

I am struggling to find a SQL Server replacement for select for update that works.
I have a master table that contains a column which is used for next order number. The application does a select from update on this row, reads the current value (while locked) adds one to this value and then updates the row, then uses the number it received. This process works perfectly on all databases I've tried but for SQL Server which does not seem to have any process for selecting data for exclusive use.
How do I do a locked read and update of something like a next order number from a sequence table is SQL Server?
BTW, I know I can use things like IDENTITY cols and stuff, to do this, but in this case I must read from this existing column. Get the value and inc it, and do it in a safe locked manner to avoid 2 users getting the same value.
UPDATE::
Thank you, that works for this case :)
DECLARE #Output char(30)
UPDATE scheme.sysdirm
SET #Output = key_value = cast(key_value as int)+1
WHERE system_key='OPLASTORD'
SELECT #Output
I have one other place I do something similar. I read and lock a stock record too.
SELECT STOCK
FROM PRODUCT
WHERE ID = ? FOR UPDATE.
I then do some validation and the do
UPDATE PRODUCT SET STOCK = ?
WHERE ID=?
I can't just use your above method here, as the value I update is based on things I do from the stock I read. But I need to ensure no one else can mess with the stock while I do this. Again, easy on other DB's with SELECT FOR UPDATE... is there a SQL Server workaround?? :)
You can simple do an UPDATE that also reads out the new value into a SQL Server variable:
DECLARE #Output INT
UPDATE dbo.YourTable
SET #Output = YourColumn = YourColumn + 1
WHERE ID = ????
SELECT #Output
Since it's an atomic UPDATE statement, it's safe against concurrency issues (since only one connection can get an update locks at any one given time). A potential second session that wants to get the incremented value at the same time will have to wait until the first one completes, thus getting the next value from the table.
As an alternative you can use the OUTPUT clause of the UPDATE statement, although this will insert into a table variable.
Create table YourTable
(
ID int,
YourColumn int
)
GO
INSERT INTO YourTable VALUES (1, 1)
GO
DECLARE #Output TABLE
(
YourColumn int
)
UPDATE YourTable
SET YourColumn = YourColumn + 1
OUTPUT inserted.YourColumn INTO #Output
WHERE ID = 1
SELECT TOP 1 YourColumn
FROM #Output
**** EDIT
If you want to ensure that no-one can change the data after you have read it, you can use a repeatable read. You should be aware that any reads of any tables you do will be locked for Update (pessimistic locking) and may cause Deadlocking. You can also sue the SELECT ... FROM TABLE (UPDLOCK) hint within a transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT STOCK
FROM PRODUCT
WHERE ID = ?
.....
...
UPDATE Product
SET Stock = nnn
WHERE ID = ?
COMMIT TRANSACTION

Transaction Concurrency Isolation - Why can I update a subset of another transactions records?

I'm trying to understand a problem I have run into that I don't believe should be possible when dealing with transactions utilizing the read committed isolation level. I have a table that is being used as a queue. In one thread (connection 1) I insert multiple batches of 20 records into each table. Each batch of 20 records is performed inside a transaction. In a second thread (connection 2) I perform an update to change the status of the records that have been inserted into the queue, which also occurs inside a transaction. When running concurrently, it is my expectation that the number of rows affected by the update (connection 2) should be a multiple of 20, since connection 1 is inserting rows in the table inserts in batches of 20 rows within a transaction.
But my testing shows this is not always the case, and on occasion I'm able to update a subset of records from connection 1's batch. Should this be possible or am I missing something about transactions, concurrency, and isolation levels? Below is a set of test scripts I created to reproduce this issue in T-SQL.
This script inserts 20,000 records into the table in transaction batches of 20.
USE ReadTest
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
SET NOCOUNT ON
DECLARE #trans_id INTEGER
DECLARE #cmd_id INTEGER
DECLARE #text_str VARCHAR(4000)
SET #trans_id = 0
SET #text_str = 'Placeholder String Value'
-- First empty the table
DELETE FROM TABLE_A
WHILE #trans_id < 1000 BEGIN
SET #trans_id = #trans_id + 1
SET #cmd_id = 0
BEGIN TRANSACTION
-- Insert 20 records into the table per transaction
WHILE #cmd_id < 20 BEGIN
SET #cmd_id = #cmd_id + 1
INSERT INTO TABLE_A ( transaction_id, command_id, [type], status, text_field )
VALUES ( #trans_id, #cmd_id, 1, 1, #text_str )
END
COMMIT
END
PRINT 'DONE'
This script updates the records in the table, changing the status from 1 to 2 and then checks the rowcount from the update operation. When the rowcount is not a multiple of 20, and print statement indicates this and the number of rows affected.
USE ReadTest
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
SET NOCOUNT ON
DECLARE #loop_counter INTEGER
DECLARE #trans_id INTEGER
DECLARE #count INTEGER
SET #loop_counter = 0
WHILE #loop_counter < 100000 BEGIN
SET #loop_counter = #loop_counter + 1
BEGIN TRANSACTION
UPDATE TABLE_A SET status = 2
WHERE status = 1
and type = 1
SET #count = ##ROWCOUNT
COMMIT
IF ( #count % 20 <> 0 ) BEGIN
-- Records in concurrent transaction inserting in batches of 20 records before commit.
PRINT '*** Rowcount not a multiple of 20. Count = ' + CAST(#count AS VARCHAR) + ' ***'
END
IF #count > 0 BEGIN
-- Delete the records where the status was changed.
DELETE TABLE_A WHERE status = 2
END
END
PRINT 'DONE'
This script creates the test queue table in a new database called ReadTest.
USE master;
GO
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'ReadTest')
BEGIN;
DROP DATABASE ReadTest;
END;
GO
CREATE DATABASE ReadTest;
GO
ALTER DATABASE ReadTest
SET ALLOW_SNAPSHOT_ISOLATION OFF
GO
ALTER DATABASE ReadTest
SET READ_COMMITTED_SNAPSHOT OFF
GO
USE ReadTest
GO
CREATE TABLE [dbo].[TABLE_A](
[ROWGUIDE] [uniqueidentifier] NOT NULL,
[TRANSACTION_ID] [int] NOT NULL,
[COMMAND_ID] [int] NOT NULL,
[TYPE] [int] NOT NULL,
[STATUS] [int] NOT NULL,
[TEXT_FIELD] [varchar](4000) NULL
CONSTRAINT [PK_TABLE_A] PRIMARY KEY NONCLUSTERED
(
[ROWGUIDE] ASC
) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[TABLE_A] ADD DEFAULT (newsequentialid()) FOR [ROWGUIDE]
GO
You expectations are completely misplaced. You have never expressed in your query the requirement to 'dequeue' exactly 20 rows. The UPDATE can return 0, 19, 20, 21 or 1000 rows and all results are correct, as long as the status is 1 and type is 1. If you expect that the 'dequeue' occurs in the order of the 'enqueue' (which is somehow eluded to in your question, but never explicitly stated) then your 'dequeue' operation must contain an ORDER BY clause. Had you add such an explicitly stated requirement then your expectation that 'dequeue' always return an entire batch of 'enqueue' rows (ie. multiple of 20 rows) would be one step closer to being a reasonable expectation. As things stand right now, is, as I said, completely misplaced.
For a lengthier discussion see Using Tables as Queues.
I shouldn't be concerned that while one transaction is committing a
batch of 20 inserted records, another concurrent transaction is only
able to update a subset of those records and not all 20?
Basically the question boils down to If I SELECT while I INSERT, how many inserted rows will I see?. You only have a right to be concerned if the isolation level is declared as SERIALIZABLE. None of the other isolation levels make any prediction about how many rows inserted while the UPDATE was running will be visible. Only SERIALIZABLE states that the outcome has to be the same as running the two statements one after another (ie. serialized, hence the name). While the technical details of how the UPDATE 'sees' only part of the INSERT batch are easy to understand once you consider physical order and the lack of ORDER BY clause, the explanation is irrelevant. The fundamental issue is that the expectation is non-warranted. Even if the 'issue' is 'fixed' by adding a proper ORDER BY and the correct clustered index key (the article linked above explains the details), the expectation is still non-warranted. It will still be perfectly legal for the UPDATE to 'see' 1, 19 or 21 rows, although it will be unlikely to happen.
I guess I've always understood READ COMMITTED to only read committed
data, and that a transaction commit is an atomic operation, making all
the changes that occurred in the transaction available at once.
That is correct. What is incorrect is to expect that a concurrent SELECT (or update) to see the entire change, irrelevant of where it happens to be in the execution. Open an SSMS query and run the following:
use tempdb;
go
create table test (a int not null primary key, b int);
go
insert into test (a, b) values (5,0)
go
begin transaction
insert into test (a, b) values (10,0)
Now open a new SSMS query and run the following:
update test
set b=1
output inserted.*
where b=0
This will block behind the uncommitted INSERT. Now go back to first query and run the following:
insert into test (a, b) values (1,0)
commit
When this commits, the second SSMS query will finish, and it will return two rows, not three. QED. This is READ COMMITTED. What you expect is SERIALIZABLE execution (in which case the example above will deadlock).
It could happen like this:
The writer/inserter writes 20 rows (does not commit)
The reader/updater reads one row (which is not committed - it discards it)
The writer/inserter commits
The reader/updater reads 19 rows which are now committed thus visible
I believe that only an isolation level of serializable (or snapshot isolation which is more concurrent) fixes this.

Resources