I'm developing a site in asp.net MVC that should generate invoices.
After some reading regarding the generation of the invoice numbers, my understading is to use a trigger or a stored procedure to be sure to have the number generated correctly without skipped or duplicated numbers, due to concurrent inserts of invoices.
For what I understand, the best approach would be to create a trigger, in the after insert of my invoices table, that does the work in a single transaction.
So I came up with this, that seems to work nice (more tests this weekend)
CREATE TRIGGER Invoices_SetInvoiceNum ON Invoices AFTER INSERT
AS
declare #next int
Begin tran
set nocount on
update counters set #next=InvoiceNum=InvoiceNum+1
update Invoices set InvoiceNum = #next from inserted i join Invoices p on p.Id = i.Id
IF ##ERROR = 0
COMMIT TRAN
ELSE
ROLLBACK TRAN
Is this a good approach or there is a better way to be sure that the number will never be duplicated or skipped?
It's more safe to use also a table/record lock? if yes could you suggest an integration to my trigger?
Thanks in advance for any insight on this topic.
A better approach would be make the invoice column a Identity column and let it increment itself.
CREATE TABLE Invoices
(
InvoiceNum int IDENTITY(1,1),
AccountNum int,
OtherStuff varchar(30)
);
See this runnable example
Related
I created a stored procedure to implement rate limiting on my API, this is called about 5-10k times a second and each day I'm noticing dupes in the counter table.
It looks up the API key being passed in and then checks the counter table with the ID and date combination using an "UPSERT" and if it finds a result it does an UPDATE [count]+1 and if not it will INSERT a new row.
There is no primary key in the counter table.
Here is the stored procedure:
USE [omdb]
GO
/****** Object: StoredProcedure [dbo].[CheckKey] Script Date: 6/17/2017 10:39:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[CheckKey] (
#apikey AS VARCHAR(10)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #userID as int
DECLARE #limit as int
DECLARE #curCount as int
DECLARE #curDate as Date = GETDATE()
SELECT #userID = id, #limit = limit FROM [users] WHERE apiKey = #apikey
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason]
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert
MERGE [counter] AS t
USING (SELECT #userID AS ID) AS s
ON t.[ID] = s.[ID] AND t.[date] = #curDate
WHEN MATCHED THEN UPDATE SET t.[count] = t.[count]+1
WHEN NOT MATCHED THEN INSERT ([ID], [date], [count]) VALUES (#userID, #curDate, 1);
COMMIT TRANSACTION Upsert
SELECT #curCount = [count] FROM [counter] WHERE ID = #userID AND [date] = #curDate
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END
END
END
I also think some locks are happening after introducing this SP.
The dupes aren't breaking anything, but I'm curious if it's something fundamentally wrong with my code or if I should setup a constraint in the table to prevent this. Thanks
Update 6/23/17: I dropped the MERGE statement and tried using ##ROWCOUNT but it also caused dupes
BEGIN TRANSACTION Upsert
UPDATE [counter] SET [count] = [count]+1 WHERE [ID] = #userID AND [date] = #curDate
IF ##ROWCOUNT = 0 AND ##ERROR = 0
INSERT INTO [counter] ([ID], [date], [count]) VALUES (#userID, #curDate, 1)
COMMIT TRANSACTION Upsert
A HOLDLOCK hint on the update statement will avoid the race condition. To prevent deadlocks, I suggest a clustered composite primary key (or unique index) on ID and date.
The example below incorporates these changes and uses the SET <variable> = <column> = <expression> form of the SET clause to avoid the need for the subsequent SELECT of the final counter value and thereby improve performance.
ALTER PROCEDURE [dbo].[CheckKey]
#apikey AS VARCHAR(10)
AS
SET NOCOUNT ON;
--SET XACT_ABORT ON is a best practice for procs with explcit transactions
SET XACT_ABORT ON;
DECLARE
#userID as int
, #limit as int
, #curCount as int
, #curDate as Date = GETDATE();
BEGIN TRY;
SELECT
#userID = id
, #limit = limit
FROM [users]
WHERE apiKey = #apikey;
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason];
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert;
UPDATE [counter] WITH(HOLDLOCK)
SET #curCount = [count] = [count] + 1
WHERE
[ID] = #userID
AND [date] = #curDate;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO [counter] ([ID], [date], [count])
VALUES (#userID, #curDate, 1);
END;
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END;
COMMIT TRANSACTION Upsert;
END;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
GO
Probably not the answer you're looking for but for a rate-limiting counter I would use a cache like Redis in a middleware before hitting the API. Performance-wise it's pretty great since Redis would have no problem with the load and your DB won’t be impacted.
And if you want to keep a history of hits per api key per day in SQL, run a daily task to import yesterday's counts from Redis to SQL.
The data-set would be small enough to get a Redis instance that would cost literally nothing (or close).
It will be the merge statement getting into a race condition with itself, i.e. your API is getting called by the same client and both times the merge statement finds no row so inserts one. Merge isn't an atomic operation, even though it's reasonable to assume it is. For example see this bug report for SQL 2008, about merge causing deadlocks, the SQL server team said this is by design.
From your post I think the immediate issue is that your clients will be potentially getting a small number of free hits on your API. For example if two requests come in and see no row you'll start with two rows with a count of 1 when you'd actually want one row with a count of 2 and the client could end up getting 1 free API hit that day. If three requests crossed over you'd get three rows with a count of 1 and they could get 2 free API hits, etc.
Edit
So as your link suggests you've got two categories of options you could explore, firstly just try and get this working in SQL server, secondly other architectural solutions.
For the SQL option I would do away with the merge and consider pre-populating your clients ahead of time, either nightly or less often for several days at a time, this will leave you a single update instead of the merge/update and insert. Then you can confirm both your update and your select are fully optimised, ie have the necessary index and that they aren't causing scans. Next you could look at tweaking locking so you're only locking at the row level, see this for some more info. For the select you could also look at using NOLOCK which means you could get slightly incorrect data but this shouldn't matter in your case, you'll be using a WHERE which targets a single row always as well.
For the non-SQL options, as your link says you could look at queuing things up, obviously these would be the updates/inserts so your selects would be seeing old data. This may or may not be acceptable depending on how far apart they are although you could have this as an "eventually consistent" solution if you wanted to be strict and charge extra or take off API hits the next day or something. You could also look at caching options to store the counts, this would get more complex if your app is distributed but there are caching solutions for this. If you went with caching you could choose to not persist anything but then you'd potentially give away a load of free hits if your site went down, but you'd probably have bigger issues to worry about then anyway!
At a high level, have you considered pursuing the following scenario?
Restructuring: Set the primary key on on your table to be a composite of (ID, date). Possibly even better, just use the API Key itself instead of the arbitrary ID you're assigning it.
Query A: Do SQL Server's equivalent of "INSERT IGNORE" (it seems there are semantic equivalents for SQL Server based on a Google search) with the values (ID, TODAY(), 1). You'll also want to specify a WHERE clause that checks the ID actually exists in your API/limits table).
Query B: Update the row with (ID, TODAY()) as its primary key, setting count := count + 1, and in the very same query, do an inner join with your limits table, so that in the where clause you can specify that you'll only update the count if the count < limit.
If the majority of your requests are valid API requests or rate-limited requests, I would perform queries in the following order on every request:
Run Query B.
If 0 rows updated:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.
If 1 row updated:
continue.
If the majority of your requests are invalid API requests, I'd do the following:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.
I know at least three ways to insert a record if it doesn't already exist in a table:
The first one is using if not exist:
IF NOT EXISTS(select 1 from table where <condition>)
INSERT...VALUES
The second one is using merge:
MERGE table AS target
USING (SELECT values) AS source
ON (condition)
WHEN NOT MATCHED THEN
INSERT ... VALUES ...
The third one is using insert...select:
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>)
But which one is the best?
The first option seems to be not thread-safe, as the record might be inserted between the select statement in the if and the insert statement that follows, if two or more users try to insert the same record.
As for the second option, merge seems to be an overkill for this, as the documentation states:
Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating the row if it does match. When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.
So I think the third option is the best for this scenario (only insert the record if it doesn't already exist, no need to update if it does), but I would like to know what SQL Server experts think.
Please note that after the insert, I'm not interested to know whether the record was already there or whether it's a brand new record, I just need it to be there so that I can carry on with the rest of the stored procedure.
When you need to guarantee the uniqueness of records on a condition that can not to be expressed by a UNIQUE or PRIMARY KEY constraint, you indeed need to make sure that the check for existence and insert are being done in one transaction. You can achieve this by either:
Using one SQL statement performing the check and the insert (your third option)
Using a transaction with the appropriate isolation level
There is a fourth way though that will help you better structure your code and also make it work in situations where you need to process a batch of records at once. You can create a TABLE variable or a temporary table, insert all of the records that need to be inserted in there and then write the INSERT, UPDATE and DELETE statements based on this variable.
Below is (pseudo)code demonstrating this approach:
-- Logic to create the data to be inserted if necessary
DECLARE #toInsert TABLE (idCol INT PRIMARY KEY,dataCol VARCHAR(MAX))
INSERT INTO #toInsert (idCol,dataCol) VALUES (1,'row 1'),(2,'row 2'),(3,'row 3')
-- Logic to insert the data
INSERT INTO realTable (idCol,dataCol)
SELECT TI.*
FROM #toInsert TI
WHERE NOT EXISTS (SELECT 1 FROM realTable RT WHERE RT.dataCol=TI.dataCol)
In many situations I use this approach as it makes the TSQL code easier to read, possible to refactor and apply unit tests to.
Following Vladimir Baranov's comment, reading Dan Guzman's blog posts about Conditional INSERT/UPDATE Race Condition and “UPSERT” Race Condition With MERGE, seems like all three options suffers from the same drawbacks in a multi-user environment.
Eliminating the merge option as an overkill, we are left with options 1 and 3.
Dan's proposed solution is to use an explicit transaction and add lock hints to the select to avoid race condition.
This way, option 1 becomes:
BEGIN TRANSACTION
IF NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK) where <condition>)
BEGIN
INSERT...VALUES
END
COMMIT TRANSACTION
and option 2 becomes:
BEGIN TRANSACTION
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK)where <condition>)
COMMIT TRANSACTION
Of course, in both options there need to be some error handling - every transaction should use a try...catch so that we can rollback the transaction in case of an error.
That being said, I think the 3rd option is probably my personal favorite, but I don't think there should be a difference.
Update
Following a conversation I've had with Aaron Bertrand in the comments of some other question - I'm not entirely convinced that using ISOLATION LEVEL is a better solution than individual query hints, but at least that's another option to consider:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>);
COMMIT TRANSACTION;
I've been searching around for a question similar to this for awhile now, and I haven't found anything, so if this has been asked before, this will at least serve as a good pointer for those ignorant of the proper nomenclature.
I want to INSERT INTO a table if a row doesn't already exist, based on a unique key. It it does exist, then I want to get the primary key Id of that row.
Imagine a table that holds email addresses:
EmailAddressId(PK) | EmailAddress(UK)
I want to INSERT into that table a new Email Address, but there is a unique constraint on EmailAddress. Thus, if the new Email Address is the same as an existing, the INSERT will fail. In that case, I want to select the existing EmailAddressId from the database for the EmailAddress.
I want to do this in the fewest number of operations, assuming that collisions will be a rare case.
Thus, I setup a TRY...CATCH block within a Stored Procedure as follows:
ALTER PROCEDURE [dbo].[EmailAddressWrite]
#EmailAddress nvarchar[256]
BEGIN
SET NOCOUNT ON;
BEGIN TRANSACTION
DECLARE #EmailAddressId INT
BEGIN TRY
INSERT INTO EmailAddress VALUES (#EmailAddress)
SET #EmailAddressId = (SELECT SCOPE_IDENTITY())
END TRY
BEGIN CATCH
SET #EmailAddressId = (SELECT EmailAddressId FROM EmailAddress WHERE EmailAddress = #EmailAddress)
END CATCH
--Do some more stuff with the Id now.
COMMIT TRANSACTION
RETURN #EmailAddressId
END
The code above functions, and produces the required result, but the Internet makes me think that using TRY...CATCH in this fashion might be slow...thus I'm unsure if this is an optimal solution.
I've only found one other solution which is to SELECT first, and INSERT second. This would result in 2 operations almost all of the time, as I am anticipating very few duplicate email addresses (at least for a month or more).
Is this the optimal solution to achieve 1 operation on INSERT and 2 operations on INSERT fail?
What other solutions can achieve 1 operation on INSERT and 2
operations on INSERT fail?
If I've misused any terminology, please correct it.
DECLARE #id INT
DECLARE #newid TABLE
(
emailAddressId INT NOT NULL PRIMARY KEY
)
;
WITH t AS
(
SELECT *
FROM emailAddress WITH (ROWLOCK, HOLDLOCK)
WHERE emailAddress = #emailAddress
)
MERGE
INTO t
USING (
SELECT #emailAddress
) s (emailAddress)
ON 1 = 1
WHEN NOT MATCHED BY TARGET THEN
INSERT (emailAddress)
VALUES (emailAddress)
WHEN MATCHED THEN
UPDATE
SET #id = 1
OUTPUT INSERTED.emailAddressId
INTO #newid
;
I have table in which the data is been continuously added at a rapid pace.
And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.
I am using the below query to
BEGIN TRAN readrowdata
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
WITH q AS
(
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
)
DELETE from q
COMMIT TRANSACTION readrowdata
I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.
Is the query is ok in performance term? If i can improve it then please provide me your suggestion.
Thank you
Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...
DECLARE #rowsBeingProcessed TABLE (
id INT
);
WITH rows AS (
SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO #rowsBeingProcessed;
-- Business Logic Here
DELETE yourTable WHERE RowDataID IN (SELECT id FROM #rowsBeingProcessed);
Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.
By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.
If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).
Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!
You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...
Data Definition language
CREATE TABLE SampleTable
(
id int
)
Sample Record
insert into SampleTable(id)Values(1)
Sample Trigger
CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
Set NOCOUNT ON
SET XACT_ABORT ON
Begin Try
Begin Tran
Select ID From Inserted
DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);
Commit Tran
End Try
Begin Catch
Rollback Tran
End Catch
End
Hope this is very simple and helpful
If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?
If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?
I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:
DECLARE #Output table
(
StaffID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO #Output
FROM Sales.vSalesPerson sp
INNER JOIN dbo.SalesStaff ss
ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM #output;
Maybe it will be helpfull for you.
I'm seeing behavior which looks like the READPAST hint is set on the database itself.
The rub: I don't think this is possible.
We have table foo (id int primary key identity, name varchar(50) not null unique);
I have several threads which do, basically
id = select id from foo where name = ?
if id == null
insert into foo (name) values (?)
id = select id from foo where name = ?
Each thread is responsible for inserting its own name (no two threads try to insert the same name at the same time). Client is java.
READ_COMMITTED_SNAPSHOT is ON, transaction isolation is specifically set to READ COMMITTED, using Connection.setTransactionIsolation( Connection.TRANSACTION_READ_COMMITTED );
Symptom is that if one thread is inserting, the other thread can't see it's row -- even rows which were committed to the database before the application started -- and tries to insert, but gets a duplicate-key-exception from the unique index on name.
Throw me a bone here?
You're at the wrong isolation level. Remember what happens with the snapshot isolation level. If one transaction is making a change, no other concurrent transactions see that transaction. Period. Other transactions only will see your changes once you have committed, but only if they START after your commit. The solution to this is to use a different isolation level. Wrap your statements in a transaction and SET TRANSACTION LEVEL SERIALIZABLE. This will ensure that your other concurrent transactions work as if they were all run serially, which is what you seem to want here.
Sounds like you're not wrapping the select and insert into a transaction?
As a solution, you could:
insert into foo (col1,col2,col3) values ('a','b','c')
where not exists (select * from foo where col1 = 'a')
After this, ##rowcount will be 1 if can check if a row was inserted.
SELECT SCOPE_IDENTITY()
should do the trick here...
plus wrapping into a transaction like previous poster mentioned.
The moral of this story is fully explained in my blog post "You can't hold onto nothing" but the short version of this is that you want to use the HOLDLOCK hint. I use the pattern:
INSERT INTO dbo.Foo(Name)
SELECT TOP 1
#name AS Name
FROM (SELECT 1 AS FakeColumn) AS FakeTable
WHERE NOT EXISTS (SELECT * FROM dbo.Foo WITH (HOLDLOCK)
WHERE Name=#name)
SELECT ID FROM dbo.Foo WHERE Name=#name