I have a table with unique constraint on it:
create table dbo.MyTab
(
MyTabID int primary key identity,
SomeValue nvarchar(50)
);
Create Unique Index IX_UQ_SomeValue
On dbo.MyTab(SomeValue);
Go
Which code is better to check for duplicates (success = 0 if duplicate found)?
Option 1
Declare #someValue nvarchar(50) = 'aaa'
Declare #success bit = 1;
Begin Try
Insert Into MyTab(SomeValue) Values ('aaa');
End Try
Begin Catch
-- lets assume that only constraint errors can happen
Set #success = 0;
End Catch
Select #success
Option 2
Declare #someValue nvarchar(50) = 'aaa'
Declare #success bit = 1;
IF EXISTS (Select 1 From MyTab Where SomeValue = #someValue)
Set #success = 0;
Else
Insert Into MyTab(SomeValue) Values ('aaa');
Select #success
From my point of view- i do believe that Try/Catch is for errors, that were NOT expected (like deadlock or even constraints when duplicates are not expected). In this case- it is possible that sometimes a user will try to submit duplicate, so the error is expected.
I have found article by Aaron Bertrand that states- checking for duplicates is not much slower even if most of inserts are successful.
There is also loads of advices over the net to use Try/Catch (to avoid 2 statements not 1). In my environment there could be just like 1% of unsuccessful cases, so that kind of makes sense too.
What is your opinion? Whats other reasons to use option 1 OR option 2?
UPDATE: I'm not sure it is important in this case, but table have instead of update trigger (for audit purposes- row deletion also happens through Update statement).
I've seen that article but note that for low failure rates I'd prefer the "JFDI" pattern. I've used this on high volume systems before (40k rows/second).
In Aaron's code, you can still get a duplicate when testing first under high load and lots of writes. (explained here on dba.se) This is important: your duplicates still happen, just less often. You still need exception handling and knowing when to ignore the duplicate error (2627)
Edit: explained succinctly by Remus in another answer
However, I would have a separate TRY/CATCH to test only for the duplicate error
BEGIN TRY
-- stuff
BEGIN TRY
INSERT etc
END TRY
BEGIN CATCH
IF ERROR_NUMBER() <> 2627
RAISERROR etc
END CATCH
--more stuff
BEGIN CATCH
RAISERROR etc
END CATCH
To start with, the EXISTS(SELECT ...) is incorrect as it fails under concurrency: multiple transactions could run the check concurrently and all conclude that they have to INSERT, one will be the the lucky winner that inserts first, all the rest will hit constraint violation. In other words you have a race condition between the check and the insert. So you will have to TRY/CATCH anyway, so better just try/catch.
Error logging
Don't hold me for this but there are likely logging implications when an exception is thrown. If you check before inserting no such thing happens.
Knowing why and when it can break
try/catch block should be used for parts that can break for non-deterministic reasons. I would say it's wiser in your case to check existing records because you know it can break and why exactly. So checking it yourself is from a developer's point of view a better way.
But in your code it may still break on insert because between the check time and insert time some other user inserted it already... But that is (as said previously) non-deterministic error. That's why you:
should be checking with exists
inserting within try/catch
Self explanatory code
Another positive is also that it is plain to see from the code why it can break while the try/catch block can hide that and one may remove them thinking why is this here, it's just inserting records...
Option - 3
Begin Try
SET XACT_ABORT ON
Begin Tran
IF NOT EXISTS (Select 1 From MyTab Where SomeValue = #someValue)
Begin
Insert Into MyTab(SomeValue) Values ('aaa');
End
Commit Tran
End Try
begin Catch
Rollback Tran
End Catch
Why not implement a INSTEAD OF INSERT trigger on the table? You can check if the row exists, do nothing if it does, and insert the row if it doesn't.
Related
Would there be any benefit of not using SCOPE_IDENTITY() and switching to ##IDENTITY? For the area I'm talking about is part of an install script that sets up a database for our customers. It's inserting a record in one table and using the identifier key from that table and inserting it into a foreign key into another. We are doing this twice.
We seem to have a rare condition in which the 2nd time this happens, we are inserting the id from the first insert into the 2nd table for both passes, causing issues with the data. There is a chance that something else altogether is causing this, but my lead seemed to zeroed in on the SCOPE_IDENTITY() as possibly being the culprit.
Declare #TheId int
Insert into dbo.TableName (Name) Values ('xxxx')
Select #TheId = SCOPE_IDENTITY()
-- some code here that uses #TheId
-- ...
Insert into dbo.TableName (Name) Values ('yyyy')
Select #TheId = SCOPE_IDENTITY()
-- some code here that uses #TheId
-- at this point, we may have the condition that SCOPE_IDENTITY() still has the value before that 2nd insert...
The only way scope_identity() could have the prior id value in this context is if the INSERT statement does not create any rows. In that situation, ##IDENTITY isn't gonna fix anything. In fact, ##IDENTITY is less specific, and therefore could only hope to make things worse.
What you can do is use a different variable for the second insert. Or, you could set #TheId back to NULL before the second insert runs. In this way, you'll be able to tell if something went wrong. ##rowcount is also useful for this.
I did see this in the comments:
"The second insert did not fail as the record was found in the database."
I put it to you perhaps the record was already in the database, before the code ran. Moreover, if there is a constraint on the table this could be the reason why the insert fails.
Within the scope of the proc or script the #TheId created by the first insert is not same object as the #TheId created by the second insert. While it's possible to reuse variables it's not a good practice imo when it comes to multiple DML statements within a code block. In this script I add TRY/CATCH and SET XACT_ABORT ON to ensure a complete rollback of all DML statements within the block.
Something like this
set nocount on;
set xact_abort on;
begin transaction
begin try
Insert into dbo.TableName (Name) Values ('xxxx');
if ##rowcount=1
begin
Declare #Id1 int = SCOPE_IDENTITY();
-- some code here that uses #Id1
-- ...
end
else
throw 50000, 'The first insert failed', 1;
Insert into dbo.TableName (Name) Values ('yyyy');
if ##rowcount=1
begin
Declare #Id2 int = SCOPE_IDENTITY();
-- some code here that uses #Id2
-- ...
end
else
throw 50000, 'The second insert failed', 1;
commit transaction
end try
begin catch
/* put error handling here */
rollback transaction
end catch
Thanks everyone for the help. We will likely go with creating a new variable for the 2nd insert.
I have a stored procedure where I send in an user defined type which is a table. I have simplified to make it easier to read.
CREATE TYPE [dbo].[ProjectTableType] AS TABLE(
[DbId] [uniqueidentifier] NOT NULL,
[DbParentId] [uniqueidentifier] NULL,
[Description] [text] NULL
)
CREATE PROCEDURE [dbo].[udsp_ProjectDUI] (#cmd varchar(10),
#tblProjects ProjectTableType READONLY) AS BEGIN
DECLARE #myNewPKTable TABLE (myNewPK uniqueidentifier)
IF(LOWER(#cmd) = 'insert')
BEGIN
INSERT INTO
dbo.Project
(
DbId,
DbParentId,
Description)
OUTPUT INSERTED.DbId INTO #myNewPKTable
SELECT NEWID(),
DbParentId,
Description
FROM #tblProjects;
SELECT * FROM dbo.Project WHERE dbid IN (SELECT myNewPK FROM #myNewPKTable);
END
This is for a DLL that other applications will use so we aren't in charge on validation necessarily. I want to mimic BULK INSERT where if one rows fails to insert but the other rows are fine, the correct ones will still insert. Is there a way to do this? I want to do it for UPDATE as well where if one fails, the stored procedure will continue to try updating the others.
The only option I can think of is to only do one at a time (either a loop in the code where the stored proc is called multiple times or a loop in the stored procedure), but was wondering what the performance hit would be for that or if there is a better solution.
Not sure which errors you're wanting to continue on, but unless you're running into all manner of unexpected errors, I'd try to avoid devolving into RBAR just yet.
Check Explicit Violations
The main thing I would think would be PK volations which you can avoid by just checking for existence before insert (and update). If there are other business logic failure conditions, you can check those here as well.
insert into dbo.Project
(
DbId,
DbParentId,
Description
)
output insert.DbId
into #myNewPKTable (DbId)
select
DbId = newid(),
DbParentId = s.DbParentId,
Description = s.Description
from #tblProjects s -- source
-- LOJ null check makes sure we don't violate PK
-- NOTE: I'm pretending this is an alternate key of the table.
left outer join dbo.Project t -- target
on s.dbParentId = t.dbParentId
where t.dbParentId is null
If at all possible, I'd try to stick with a batch update, and use join predicates to eliminate the possibility of most errors you expect to see. Changing to RBAR processing because you're worried you "might" get a system shutdown failure is probably a waste of time. Then, if you hit a really nasty error you can't recover from, fail the batch, legitimately.
RBAR
Alternatively, if you absolutely need row-by-row granularity of successes or failures, you could do try/catch around each statement and have the catch block do nothing (or log something).
declare
#DBParentId int,
#Description nvarchar(1000),
#Ident int
declare c cursor local fast_forward for
select
DbParentId = s.DbParentId,
Description = s.Description
from #tblProjects
open c
fetch next from c into #DBParentId, #Description
while ##fetch_status = 0
begin
begin try
insert into dbo.Project
(
DbId,
DbParentId,
Description
)
output insert.DbId
into #myNewPKTable (DbId)
select
newid(),
#DBParentId,
#Description
end try
begin catch
-- log something if you want
print error_message()
end catch
fetch next from c into #DBParentId, #Description
end
Hybrid
You might be able to get clever and hybridize things. One option might be to make the web-facing insert procedure actually insert into a lightweight, minimally keyed/constrainted table (like a queue). Then, every minute or so, have an Agent job run through the logged calls, and operate on them in a batch. Doesn't fundamentally change any of the patterns here, but it makes the processing asynchronous so the caller doesn't have to wait, and by batching requests together, you can save processing power piggybacking on what SQL does best; set based ops.
Another option might be to do the most set-based processing you can (using checks to prevent business rule or constraint violations). If anything fails, you could then spin off an RBAR process for the remaining rows. If all succeeds though, that RBAR process is never hit.
Conclusion
Several ways to go about this. I'd try to use set-based operations as much as possible unless you have a really strong reason for needing row-by-row granularity.
You can avoid most errors just by constructing your insert/update statement correctly.
If you need to, you can use a try/catch with an "empty" catch block so failures don't stop the total processing
Depending on random odds and ends of your specific situation, you might want or need to hybridize these two approaches
Using SQL Server 2014:
I am going through the following article that includes useful patterns for TSQL error handling:
https://msdn.microsoft.com/en-IN/library/ms175976.aspx
I like to log errors so later on I can query, monitor, track and inspect the errors took place in my application's store procedures.
I was thinking to create a table and insert the error details as a row into the table in the CATCH block; however I am concern this might not be a good pattern OR there might be a built-in SQL server feature that can log the errors generated by the ;THROW statement.
What would be the best way to log the errors?
Update 1
I should mention that I always set XACT_ABORT on top of my SPs:
SET XACT_ABORT, NOCOUNT ON
Is it safe to assume that there is no way to log errors when XACT_ABORT is ON?
Update 2
The SET XACT_ABORT ON is according to this post:
http://www.sommarskog.se/error_handling/Part1.html#jumpXACT_ABORT
Can xp_logevent be a better alternative than adding an error record to a log table?
You have to be very careful with logging from CATCH locks. First and foremost, you must check the XACT_STATE() and honor it. If xact_state is -1 (
'uncommittable transaction') you cannot do any transactional operation, so the INSERT fail. You must first rollback, then insert. But you cannot simply rollback, because you may be in xact_state 0 (no transaction) in which case rollback would fail. And if xact_state is 1, you are still in the original transaction, and your INSERT may still be rolled back later and you'll loose all track of this error ever occurring.
Another approach to consider is to generate a user defined profiler event using sp_trace_generateevent and have a system trace monitoring your user event ID. This works in any xact_state state and has the advantage of keeping the record even if the encompassing transaction will roll back later.
I should mention that I always set XACT_ABORT
Stop doing this. Read Exception handling and nested transactions for a good SP pattern vis-a-vis error handling and transactions.
Yes it is better.
If you want to store then try this.
declare #Error_msg_desc varchar(500)
,#Error_err_code int
,#Error_sev_num int
,#Error_proc_nm varchar(100)
,#Error_line_num int
begin try
select 1/0
end try
begin catch
select #Error_err_code = ERROR_NUMBER()
,#Error_msg_desc = ERROR_MESSAGE()
,#Error_sev_num = ERROR_SEVERITY()
,#Error_proc_nm = ERROR_PROCEDURE()
,#Error_line_num = ERROR_LINE()
--create SqlLog Table
--Insert into Log Table
Insert into Sqllog values(#Error_err_code,#Error_msg_desc,#Error_sev_num,#Error_proc_nm,#Error_line_num)
end catch
For so long, I've omitted using SQL Transactions, mostly out of ignorance.
But let's say I have a procedure like this:
CREATE PROCEDURE CreatePerson
AS
BEGIN
declare #NewPerson INT
INSERT INTO PersonTable ( Columns... ) VALUES ( #Parameters... )
SET #NewPerson = SCOPE_IDENTITY()
INSERT INTO AnotherTable ( #PersonID, CreatedOn ) VALUES ( #NewPerson, getdate() )
END
GO
In the above example, the second insert depends on the first, as in it will fail if the first one fails.
Secondly, and for whatever reason, transactions are confusing me as far as proper implementation. I see one example here, another there, and I just opened up adventureworks to find another example with try, catch, rollback, etc.
I'm not logging errors. Should I use a transaction here? Is it worth it?
If so, how should it be properly implemented? Based on the examples I've seen:
CREATE PROCEURE CreatePerson
AS
BEGIN TRANSACTION
....
COMMIT TRANSACTION
GO
Or:
CREATE PROCEDURE CreatePerson
AS
BEGIN
BEGIN TRANSACTION
COMMIT TRANSACTION
END
GO
Or:
CREATE PROCEDURE CreatePerson
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
...
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0
BEGIN
ROLLBACK TRANSACTION
END
END CATCH
END
Lastly, in my real code, I have more like 5 separate inserts all based on the newly generated ID for person. If you were me, what would you do? This question is perhaps redundant or a duplicate, but for whatever reason I can't seem to reconcile in my mind the best way to handle this.
Another area of confusion is the rollback. If a transaction must be committed as a single unit of operation, what happens if you don't use the rollback? Or is the rollback needed only in a Try/Catch similar to vb.net/c# error handling?
You are probably missing the point of this: transactions are suppose to make a set of separate actions into one, so if one fails, you can rollback and your database will stay as if nothing happened.
This is easier to see if, let's say, you are saving the details of a purchase in a store. You save the data of the customer (like Name or Address), but somehow in between, you missed the details (server crash). So now you know that John Doe bought something, but you don't know what. You Data Integrity is at stake.
Your third sample code is correct if you want to handle transactions in the SP. To return an error, you can try:
RETURN ##ERROR
After the ROLLBACK. Also, please review about:
set xact_abort on
as in: SQL Server - transactions roll back on error?
If the first insert succeeds and the second fails you will have a database in a bad state because SQL Server cannot read your mind. It will leave the first insert (change) in the database even though you probably wanted it all tosucceed or all fail.
To ensure this you should wrap all the statements in begin transaction as you illustrated in the last example. Its important to have a catch so any half completed transaction are explicitly rolled back and the resources (used by the transaction) released as soon as possible.
I simply want a stored procedure that calculates a unique id (that is separate from the identity column) and inserts it. If it fails it just calls itself to regenerate said id. I have been looking for an example, but cant find one, and am not sure how I should get the SP to call itself, and set the appropriate output parameter. I would also appreciate someone pointing out how to test this SP also.
Edit
What I have now come up with is the following (Note I already have an identity column, I need a secondary id column.
ALTER PROCEDURE [dbo].[DataInstance_Insert]
#DataContainerId int out,
#ModelEntityId int,
#ParentDataContainerId int,
#DataInstanceId int out
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
WHILE (#DataContainerId is null)
EXEC DataContainer_Insert #ModelEntityId, #ParentDataContainerId, #DataContainerId output
INSERT INTO DataInstance (DataContainerId, ModelEntityId)
VALUES (#DataContainerId, #ModelEntityId)
SELECT #DataInstanceId = scope_identity()
END
ALTER PROCEDURE [dbo].[DataContainer_Insert]
#ModelEntityId int,
#ParentDataContainerId int,
#DataContainerId int out
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
DECLARE #ReferenceId int
SELECT #ReferenceId = isnull(Max(ReferenceId)+1,1) from DataContainer Where ModelEntityId=#ModelEntityId
INSERT INTO DataContainer (ReferenceId, ModelEntityId, ParentDataContainerId)
VALUES (#ReferenceId, #ModelEntityId, #ParentDataContainerId)
SELECT #DataContainerId = scope_identity()
END TRY
BEGIN CATCH
END CATCH
END
In CATCH blocks you must check the XACT_STATE value. You may be in a doomed transaction (-1) and in that case you are forced to rollback. Or your transaction may had already had rolled back and you should not continue to work under the assumption of an existing transaction. For a template procedure that handles T-SQL exceptions, try/catch blcoks and transactions correctly, see Exception handling and nested transactions
Never, under any languages, do recursive calls in exception blocks. You don't check why you hit an exception, therefore you don't know if is OK to try again. What if the exception is 652, read-only filegroup? Or your database is at max size? You'll re-curse until you'll hit stackoverflow...
Code that reads a value, makes a decision based on that value, then writes something is always going to fail under concurrency unless properly protected. You need to wrap the SELECT and INSERT in a transaction and your SELECT must be under SERIALISABLE isolation level.
And finally, ignoring the blatantly wrong code in your post, here is how you call a stored procedure passing in OUTPUT arguments:
exec DataContainer_Insert #SomeData, #DataContainerId OUTPUT;
Better yet, why not make UserID an identity column instead of trying to re-implement an identity column manually?
BTW: I think you meant
VALUES (#DataContainerId + 1 , SomeData)
Why not use the:
NewId()
T SQL function? (assuming sql server 2005/2008)
that sp will never ever do a successful insert, you have an identity property on the DataContainer table but you are inserting the ID, in that case you will need to set identity_insert on but then scope_identity() won't work
A PK violation also might not be trapped so you might also need to check for XACT_STATE()
why are you messing around with max, use scope_identity() and be done with it