How to mimic "Bulk" Insert with DataTable in SQL Server

How to mimic "Bulk" Insert with DataTable in SQL Server - sql-server

I have a stored procedure where I send in an user defined type which is a table. I have simplified to make it easier to read.
CREATE TYPE [dbo].[ProjectTableType] AS TABLE(
[DbId] [uniqueidentifier] NOT NULL,
[DbParentId] [uniqueidentifier] NULL,
[Description] [text] NULL
)
CREATE PROCEDURE [dbo].[udsp_ProjectDUI] (#cmd varchar(10),
#tblProjects ProjectTableType READONLY) AS BEGIN
DECLARE #myNewPKTable TABLE (myNewPK uniqueidentifier)
IF(LOWER(#cmd) = 'insert')
BEGIN
INSERT INTO
dbo.Project
(
DbId,
DbParentId,
Description)
OUTPUT INSERTED.DbId INTO #myNewPKTable
SELECT NEWID(),
DbParentId,
Description
FROM #tblProjects;
SELECT * FROM dbo.Project WHERE dbid IN (SELECT myNewPK FROM #myNewPKTable);
END
This is for a DLL that other applications will use so we aren't in charge on validation necessarily. I want to mimic BULK INSERT where if one rows fails to insert but the other rows are fine, the correct ones will still insert. Is there a way to do this? I want to do it for UPDATE as well where if one fails, the stored procedure will continue to try updating the others.
The only option I can think of is to only do one at a time (either a loop in the code where the stored proc is called multiple times or a loop in the stored procedure), but was wondering what the performance hit would be for that or if there is a better solution.

Not sure which errors you're wanting to continue on, but unless you're running into all manner of unexpected errors, I'd try to avoid devolving into RBAR just yet.
Check Explicit Violations
The main thing I would think would be PK volations which you can avoid by just checking for existence before insert (and update). If there are other business logic failure conditions, you can check those here as well.
insert into dbo.Project
(
DbId,
DbParentId,
Description
)
output insert.DbId
into #myNewPKTable (DbId)
select
DbId = newid(),
DbParentId = s.DbParentId,
Description = s.Description
from #tblProjects s -- source
-- LOJ null check makes sure we don't violate PK
-- NOTE: I'm pretending this is an alternate key of the table.
left outer join dbo.Project t -- target
on s.dbParentId = t.dbParentId
where t.dbParentId is null
If at all possible, I'd try to stick with a batch update, and use join predicates to eliminate the possibility of most errors you expect to see. Changing to RBAR processing because you're worried you "might" get a system shutdown failure is probably a waste of time. Then, if you hit a really nasty error you can't recover from, fail the batch, legitimately.
RBAR
Alternatively, if you absolutely need row-by-row granularity of successes or failures, you could do try/catch around each statement and have the catch block do nothing (or log something).
declare
#DBParentId int,
#Description nvarchar(1000),
#Ident int
declare c cursor local fast_forward for
select
DbParentId = s.DbParentId,
Description = s.Description
from #tblProjects
open c
fetch next from c into #DBParentId, #Description
while ##fetch_status = 0
begin
begin try
insert into dbo.Project
(
DbId,
DbParentId,
Description
)
output insert.DbId
into #myNewPKTable (DbId)
select
newid(),
#DBParentId,
#Description
end try
begin catch
-- log something if you want
print error_message()
end catch
fetch next from c into #DBParentId, #Description
end
Hybrid
You might be able to get clever and hybridize things. One option might be to make the web-facing insert procedure actually insert into a lightweight, minimally keyed/constrainted table (like a queue). Then, every minute or so, have an Agent job run through the logged calls, and operate on them in a batch. Doesn't fundamentally change any of the patterns here, but it makes the processing asynchronous so the caller doesn't have to wait, and by batching requests together, you can save processing power piggybacking on what SQL does best; set based ops.
Another option might be to do the most set-based processing you can (using checks to prevent business rule or constraint violations). If anything fails, you could then spin off an RBAR process for the remaining rows. If all succeeds though, that RBAR process is never hit.
Conclusion
Several ways to go about this. I'd try to use set-based operations as much as possible unless you have a really strong reason for needing row-by-row granularity.
You can avoid most errors just by constructing your insert/update statement correctly.
If you need to, you can use a try/catch with an "empty" catch block so failures don't stop the total processing
Depending on random odds and ends of your specific situation, you might want or need to hybridize these two approaches

Related

Checking for a break while moving SQL storage

I have already completed this job, but wondering whether what I did was correct and / or whether you would have performed the job differently.
A couple of the infrastructure guys came to me saying that they were moving the storage for one of the SQL servers and they wanted to check whether there was any break in connection or performance while they were doing it - they had some company in that said they could move the storage without having any impact on the business and this is the first try.
To test this out, they wanted me to write a query that would use the storage and report any breaks, so I wrote the following script ad hoc:
CREATE TABLE [dbo].[TomAndRickysSpecialTable]
( [Counter] INT IDENTITY(1,1) NOT NULL
,[Result] VARCHAR(10) NOT NULL
,[TimeStamp] DATETIME2(0) NOT NULL DEFAULT(GETDATE())
,CONSTRAINT PK_TikkiTavi PRIMARY KEY CLUSTERED ([counter] ASC)
)
exec sp_configure 'remote query timeout', 0
go
reconfigure with override
go
CREATE PROC RICKYRICKYRICKYRICKYrickyTom
AS
TRUNCATE TABLE TomAndRickysSpecialTable;
INSERT TomAndRickysSpecialTable(Result) VALUES ('Start');
SET NOCOUNT ON;
TryAgain:
WHILE 1 = 1
BEGIN
BEGIN TRY
WAITFOR DELAY '00:00:01';
IF EXISTS (SELECT TOP 1 [Counter] FROM TomAndRickysSpecialTable)
INSERT TomAndRickysSpecialTable(Result) VALUES ('Yup');
RAISERROR('Yup',10,1) WITH NOWAIT;
END TRY
BEGIN CATCH
RAISERROR('Nop',10,1) WITH NOWAIT;
Goto TryAgain;
END CATCH
END
I created the table and sproc and they executed it from the server, leaving it to run and observing any breaks in the second by second timestamp.
I figured I should write something that queried and wrote to the memory, so that we could see if there was a problem, but I have no idea whether that reasoning was correct...
My question is this - would my script have actually done what they wanted? Was there any point in running it or would it for instance have failed so badly if there was a break that there was no reportable result? Could I have written a simpler or better script to do the same job?
PS. I hated writing the While loop and the goto, but figured it would do the job in this weird case!

would my script have actually done what they wanted?
Yes. SQL Server caches data in memory, but changes must be hardened to disk on commit, and by default SQL Server commits after each statement. So the INSERT would have failed if the disk containing the database log was unavailable.
BTW 'remote query timeout' is irrelevant.
I hated writing the While loop and the goto,
It's not needed. After the CATCH block the WHILE loop would have run again.

Stored procedure to constantly check table for records and process them

We have a process that is causing dirty read errors. So, I was thinking of redesigning it as a queue with a single process to go through the queue.
My idea was to create a table that various processes could insert into. Then, one process actually processes the records in the table. But, we need real-time results from the processing method. So, while I was thinking of scheduling it to run every couple of seconds, that may not be fast enough. I don't want a user waiting several seconds for a result.
Essentially, I was thinking of using an infinite loop so that one stored procedure is constantly running, and that stored procedure creates transactions to perform updates.
It could be something like:
WHILE 1=1
BEGIN
--Check for new records
IF NewRecordsExist
BEGIN
--Mark new records as "in process"
BEGIN TRANSACTION
--Process records
--If errors, Rollback
--Otherwise Commit transaction
END
END
But, I don't want SQL Server to get overburdened by this one method. Basically, I want this running in the background all the time, not eating up processor power. Unless there is a lot for it to do. Then, I want it doing its work. Is there a better design pattern for this? Are stored procedures with infinite loops thread-safe? I use this pattern all the time in Windows Processes, but this particular task is not appropriate for a Windows Process.
Update: We have transaction locking set. That is not the issue. We are trying to set aside items that are reserved for orders. So, we have a stored procedure start a transaction, check what is available, and then update the table to mark what is reserved.
The problem is that when two different users attempt to reserve the same product at the same time, the first process checks availability, finds product available, then start to reserve it. But, the second process cannot see what the first process is doing (we have transaction locking set), so it has no idea that another process is trying to reserve the items. It sees the items as still available and also goes to reserve them.
We considered application locking, but we are worried about waits, delays, etc. So, another solution we came up with is one process that handles reservations in a queue. It is first come first serve. Only one process will ever be reading the queue at a time. Different processes can add to the queue, but we no longer need to worry about two processes trying to reserve the same product at the same time. Only one process will be doing the reservations. I was hoping to do this all in SQL, but that may not be possible.

Disclaimer: This may be an option, but the recommendations for using a Service Broker to serialize requests are likely the better solutions.
If you can't use a transaction, but need your your stored procedure to return an immediate result, there are ways to safely update a record in a single statement.
DECLARE #ProductId INT = 123
DECLARE #Quantity INT = 5
UPDATE Inventory
SET Available = Available - #Quantity
WHERE ProductId = #ProductId
AND Available >= #Quantity
IF ##ROW_COUNT > 0
BEGIN
-- Success
END
Under the covers, there is still a transaction occurring accompanied by a lock, but it just covers this one statement.
If you need to update multiple records (reserve multiple product IDs) in one statement, you can use the OUTPUT clause to capture which records were successfully updated, which you can then compare with the original request.
DECLARE #Request TABLE (ProductId INT, Quantity INT)
DECLARE #Result TABLE (ProductId INT, Quantity INT)
INSERT #Request VALUES (123, 5), (456, 1)
UPDATE I
SET Available = Available - R.Quantity
OUTPUT R.ProductId, R.Quantity INTO #Result
FROM #Request R
JOIN Inventory I
ON I.ProductId = R.ProductId
AND I.Available >= R.#Quantity
IF (SELECT COUNT(*) FROM #Request) = (SELECT COUNT(*) FROM #Result)
BEGIN
-- Success
END
ELSE BEGIN
-- Partial or fail. Need to undo those that may have been updated.
END
In any case, you need to thoroughly think through your error handling and undo scenarios.
If you store reservations separate from inventory and define "Available" as "Inventory.OnHand - SUM(Reserved.Quantity)", this approach is not an option.
(Comments as to why this a bad approach will likely follow.)

Detect if Stored Proc is being called within a transaction

Ok - so the answer is probably 'everything is within a transaction in SQLServer'. But here's my problem.
I have a stored procedure which returns data based on one input parameter. This is being called from an external service which I believe is Java-based.
I have a log-table to see how long each hit to this proc is taking. Kind of like this (simplified)...
TABLE log (ID INT PRIMARY KEY IDENTITY(1,1), Param VARCHAR(10), In DATETIME, Out DATETIME)
PROC troublesome
(#param VARCHAR(10))
BEGIN
--log the 'in' time and keep the ID
DECLARE #logid INT
INSERT log (In,Param) VALUES (GET_DATE(),#param)
SET #logid = SCOPE_IDENTITY()
SELECT <some stuff from another table based on #param>
--set the 'out' time
UPDATE log SET Out = GET_DATE() WHERE ID = #logid
END
So far so easily-criticised by SQL fascists.
Anyway - When I call this eg troublesome 'SOMEPARAM' - I get the result of the SELECT back and in the log table is a nice new row with the SOMEPARAM and the in and out timestamps.
Now - I watch this table, and even though no rows are going in - if I am to generate a row, I will see that the ID has skipped many places. I guess this is being caused by the external client code hitting it. They are getting the result of the SELECT - but I am not getting their log data.
This suggests to me they are wrapping their client call in a TRAN and rolling it back. Which is one thing that would cause this behaviour. I want to know...
If there is a way I can FORCE the write of the log even if it is contained within a transaction over which I have no control and which is subsequently rolled back (seems unlikely)
If there is a way I can determine from within this proc if it is executing within a transaction (and perhaps raise an error)
If there are other possible explanations for the ID skipping (and it's skipping alot like 1000 places in an hour. So I'm sure it's caused by the client code - as they are reporting successfully retrieving the results of the SELECT also.)
Thanks!

For the why rows are skipped part of the question, your assumption sounds plausible to me.
For how to force the write even within a transaction, I don't know.
As for how to discover whether the SP is running inside a transaction how about the following:
SELECT ##trancount
I tested this out in an sql batch and seems to be ok. Maybe it can take you a bit further.

SQL Server - Implementing sequences

I have a system which requires I have IDs on my data before it goes to the database. I was using GUIDs, but found them to be too big to justify the convenience.
I'm now experimenting with implementing a sequence generator which basically reserves a range of unique ID values for a given context. The code is as follows;
ALTER PROCEDURE [dbo].[Sequence.ReserveSequence]
#Name varchar(100),
#Count int,
#FirstValue bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
-- Ensure the parameters are valid
IF (#Name IS NULL OR #Count IS NULL OR #Count < 0)
RETURN -1;
-- Reserve the sequence
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Get the sequence ID, and the last reserved value of the sequence
DECLARE #SequenceID int;
DECLARE #LastValue bigint;
SELECT TOP 1 #SequenceID = [ID], #LastValue = [LastValue]
FROM [dbo].[Sequences]
WHERE [Name] = #Name;
-- Ensure the sequence exists
IF (#SequenceID IS NULL)
BEGIN
-- Create the new sequence
INSERT INTO [dbo].[Sequences] ([Name], [LastValue])
VALUES (#Name, #Count);
-- The first reserved value of a sequence is 1
SET #FirstValue = 1;
END
ELSE
BEGIN
-- Update the sequence
UPDATE [dbo].[Sequences]
SET [LastValue] = #LastValue + #Count
WHERE [ID] = #SequenceID;
-- The sequence start value will be the last previously reserved value + 1
SET #FirstValue = #LastValue + 1;
END
COMMIT TRANSACTION
END
The 'Sequences' table is just an ID, Name (unique), and the last allocated value of the sequence. Using this procedure I can request N values in a named sequence and use these as my identifiers.
This works great so far - it's extremely quick since I don't have to constantly ask for individual values, I can just use up a range of values and then request more.
The problem is that at extremely high frequency, calling the procedure concurrently can sometimes result in a deadlock. I have only found this to occur when stress testing, but I'm worried it'll crop up in production. Are there any notable flaws in this procedure, and can anyone recommend any way to improve on it? It would be nice to do with without transactions for example, but I do need this to be 'thread safe'.

MS themselves offer a solution and even they say it locks/deadlocks.
If you want to add some lock hints then you'd reduce concurrency for your high loads
Options:
You could develop against the "Denali" CTP which is the next release
Use IDENTITY and the OUTPUT clause like everyone else
Adopt/modify the solutions above
On DBA.SE there is "Emulate a TSQL sequence via a stored procedure": see dportas' answer which I think extends the MS solution.

I'd recommend sticking with the GUIDs, if as you say, this is mostly about composing data ready for a bulk insert (it's simpler than what I present below).
As an alternative, could you work with a restricted count? Say, 100 ID values at a time? In that case, you could have a table with an IDENTITY column, insert into that table, return the generated ID (say, 39), and then your code could assign all values between 3900 and 3999 (e.g. multiply up by your assumed granularity) without consulting the database server again.
Of course, this could be extended to allocating multiple IDs in a single call - provided that your okay with some IDs potentially going unused. E.g. you need 638 IDs - so you ask the database to assign you 7 new ID values (which imply that you've allocated 700 values), use the 638 you want, and the remaining 62 never get assigned.

Can you get some kind of deadlock trace? For example, enable trace flag 1222 as shown here. Duplicate the deadlock. Then look in the SQL Server log for the deadlock trace.
Also, you might inspect what locks are taken out in your code by inserting a call to exec sp_lock or select * from sys.dm_tran_locks immediately before the COMMIT TRANSACTION.

Most likely you are observing a conversion deadlock. To avoid them, you want to make sure that your table is clustered and has a PK, but this advice is specific to 2005 and 2008 R2, and they can change the implementation, rendering this advice useless. Google up "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes".
Anyway, if you observe an error during stress testing, it is likely that sooner or later it will occur in production as well.
You may want to use sp_getapplock to serialize your requests. Google up "Application Locks (or Mutexes) in SQL Server 2005". Also I described a few useful ideas here: "Developing Modifications that Survive Concurrency".

I thought I'd share my solution. I doesn't deadlock, nor does it produce duplicate values. An important difference between this and my original procedure is that it doesn't create the queue if it doesn't already exist;
ALTER PROCEDURE [dbo].[ReserveSequence]
(
#Name nvarchar(100),
#Count int,
#FirstValue bigint OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
IF (#Count <= 0)
BEGIN
SET #FirstValue = NULL;
RETURN -1;
END
DECLARE #Result TABLE ([LastValue] bigint)
-- Update the sequence last value, and get the previous one
UPDATE [Sequences]
SET [LastValue] = [LastValue] + #Count
OUTPUT INSERTED.LastValue INTO #Result
WHERE [Name] = #Name;
-- Select the first value
SELECT TOP 1 #FirstValue = [LastValue] + 1 FROM #Result;
END

How do you write a recursive stored procedure

I simply want a stored procedure that calculates a unique id (that is separate from the identity column) and inserts it. If it fails it just calls itself to regenerate said id. I have been looking for an example, but cant find one, and am not sure how I should get the SP to call itself, and set the appropriate output parameter. I would also appreciate someone pointing out how to test this SP also.
Edit
What I have now come up with is the following (Note I already have an identity column, I need a secondary id column.
ALTER PROCEDURE [dbo].[DataInstance_Insert]
#DataContainerId int out,
#ModelEntityId int,
#ParentDataContainerId int,
#DataInstanceId int out
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
WHILE (#DataContainerId is null)
EXEC DataContainer_Insert #ModelEntityId, #ParentDataContainerId, #DataContainerId output
INSERT INTO DataInstance (DataContainerId, ModelEntityId)
VALUES (#DataContainerId, #ModelEntityId)
SELECT #DataInstanceId = scope_identity()
END
ALTER PROCEDURE [dbo].[DataContainer_Insert]
#ModelEntityId int,
#ParentDataContainerId int,
#DataContainerId int out
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
DECLARE #ReferenceId int
SELECT #ReferenceId = isnull(Max(ReferenceId)+1,1) from DataContainer Where ModelEntityId=#ModelEntityId
INSERT INTO DataContainer (ReferenceId, ModelEntityId, ParentDataContainerId)
VALUES (#ReferenceId, #ModelEntityId, #ParentDataContainerId)
SELECT #DataContainerId = scope_identity()
END TRY
BEGIN CATCH
END CATCH
END

In CATCH blocks you must check the XACT_STATE value. You may be in a doomed transaction (-1) and in that case you are forced to rollback. Or your transaction may had already had rolled back and you should not continue to work under the assumption of an existing transaction. For a template procedure that handles T-SQL exceptions, try/catch blcoks and transactions correctly, see Exception handling and nested transactions
Never, under any languages, do recursive calls in exception blocks. You don't check why you hit an exception, therefore you don't know if is OK to try again. What if the exception is 652, read-only filegroup? Or your database is at max size? You'll re-curse until you'll hit stackoverflow...
Code that reads a value, makes a decision based on that value, then writes something is always going to fail under concurrency unless properly protected. You need to wrap the SELECT and INSERT in a transaction and your SELECT must be under SERIALISABLE isolation level.
And finally, ignoring the blatantly wrong code in your post, here is how you call a stored procedure passing in OUTPUT arguments:
exec DataContainer_Insert #SomeData, #DataContainerId OUTPUT;

Better yet, why not make UserID an identity column instead of trying to re-implement an identity column manually?
BTW: I think you meant
VALUES (#DataContainerId + 1 , SomeData)

Why not use the:
NewId()
T SQL function? (assuming sql server 2005/2008)

that sp will never ever do a successful insert, you have an identity property on the DataContainer table but you are inserting the ID, in that case you will need to set identity_insert on but then scope_identity() won't work
A PK violation also might not be trapped so you might also need to check for XACT_STATE()
why are you messing around with max, use scope_identity() and be done with it

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight