I have a stored procedure that is responsible for inserting or updating multiple records at once. I want to perform this in my stored procedure for the sake of performance.
This stored procedure takes in a comma-delimited list of permit IDs and a status. The permit IDs are stored in a variable called #PermitIDs. The status is stored in a variable called #Status. I have a user-defined function that converts this comma-delimited list of permit IDs into a Table. I need to go through each of these IDs and do either an insert or update into a table called PermitStatus.
If a record with the permit ID does not exist, I want to add a record. If it does exist, I'm want to update the record with the given #Status value. I know how to do this for a single ID, but I do not know how to do it for multiple IDs. For single IDs, I do the following:
-- Determine whether to add or edit the PermitStatus
DECLARE #count int
SET #count = (SELECT Count(ID) FROM PermitStatus WHERE [PermitID]=#PermitID)
-- If no records were found, insert the record, otherwise add
IF #count = 0
BEGIN
INSERT INTO
PermitStatus
(
[PermitID],
[UpdatedOn],
[Status]
)
VALUES
(
#PermitID,
GETUTCDATE(),
1
)
END
ELSE
UPDATE
PermitStatus
SET
[UpdatedOn]=GETUTCDATE(),
[Status]=#Status
WHERE
[PermitID]=#PermitID
How do I loop through the records in the Table returned by my user-defined function to dynamically insert or update the records as needed?
create a split function, and use it like:
SELECT
*
FROM YourTable y
INNER JOIN dbo.splitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTableAll]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this WILL return empty rows
----------------
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTableAll(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
RowNumber ListValue
----------- ----------
1 1
2 2
3 3
4
5
6 4
7 5
8 6777
9
10
11
(11 row(s) affected)
To make what you need work, do the following:
--this would be the existing table
DECLARE #OldData table (RowID int, RowStatus char(1))
INSERT INTO #OldData VALUES (10,'z')
INSERT INTO #OldData VALUES (20,'z')
INSERT INTO #OldData VALUES (30,'z')
INSERT INTO #OldData VALUES (70,'z')
INSERT INTO #OldData VALUES (80,'z')
INSERT INTO #OldData VALUES (90,'z')
--these would be the stored procedure input parameters
DECLARE #IDList varchar(500)
,#StatusList varchar(500)
SELECT #IDList='10,20,30,40,50,60'
,#StatusList='A,B,C,D,E,F'
--stored procedure local variable
DECLARE #InputList table (RowID int, RowStatus char(1))
--convert input prameters into a table
INSERT INTO #InputList
(RowID,RowStatus)
SELECT
i.ListValue,s.ListValue
FROM dbo.FN_ListToTableAll(',',#IDList) i
INNER JOIN dbo.FN_ListToTableAll(',',#StatusList) s ON i.RowNumber=s.RowNumber
--update all old existing rows
UPDATE o
SET RowStatus=i.RowStatus
FROM #OldData o WITH (UPDLOCK, HOLDLOCK) --to avoid race condition when there is high concurrency as per #emtucifor
INNER JOIN #InputList i ON o.RowID=i.RowID
--insert only the new rows
INSERT INTO #OldData
(RowID, RowStatus)
SELECT
i.RowID, i.RowStatus
FROM #InputList i
LEFT OUTER JOIN #OldData o ON i.RowID=o.RowID
WHERE o.RowID IS NULL
--display the old table
SELECT * FROM #OldData order BY RowID
OUTPUT:
RowID RowStatus
----------- ---------
10 A
20 B
30 C
40 D
50 E
60 F
70 z
80 z
90 z
(9 row(s) affected)
EDIT thanks to #Emtucifor click here for the tip about the race condition, I have included the locking hints in my answer, to prevent race condition problems when there is high concurrency.
There are various methods to accomplish the parts you ask are asking about.
Passing Values
There are dozens of ways to do this. Here are a few ideas to get you started:
Pass in a string of identifiers and parse it into a table, then join.
SQL 2008: Join to a table-valued parameter
Expect data to exist in a predefined temp table and join to it
Use a session-keyed permanent table
Put the code in a trigger and join to the INSERTED and DELETED tables in it.
Erland Sommarskog provides a wonderful comprehensive discussion of lists in sql server. In my opinion, the table-valued parameter in SQL 2008 is the most elegant solution for this.
Upsert/Merge
Perform a separate UPDATE and INSERT (two queries, one for each set, not row-by-row).
SQL 2008: MERGE.
An Important Gotcha
However, one thing that no one else has mentioned is that almost all upsert code, including SQL 2008 MERGE, suffers from race condition problems when there is high concurrency. Unless you use HOLDLOCK and other locking hints depending on what's being done, you will eventually run into conflicts. So you either need to lock, or respond to errors appropriately (some systems with huge transactions per second have used the error-response method successfully, instead of using locks).
One thing to realize is that different combinations of lock hints implicitly change the transaction isolation level, which affects what type of locks are acquired. This changes everything: which other locks are granted (such as a simple read), the timing of when a lock is escalated to update from update intent, and so on.
I strongly encourage you to read more detail on these race condition problems. You need to get this right.
Conditional Insert/Update Race Condition
“UPSERT” Race Condition With MERGE
Example Code
CREATE PROCEDURE dbo.PermitStatusUpdate
#PermitIDs varchar(8000), -- or (max)
#Status int
AS
SET NOCOUNT, XACT_ABORT ON -- see note below
BEGIN TRAN
DECLARE #Permits TABLE (
PermitID int NOT NULL PRIMARY KEY CLUSTERED
)
INSERT #Permits
SELECT Value FROM dbo.Split(#PermitIDs) -- split function of your choice
UPDATE S
SET
UpdatedOn = GETUTCDATE(),
Status = #Status
FROM
PermitStatus S WITH (UPDLOCK, HOLDLOCK)
INNER JOIN #Permits P ON S.PermitID = P.PermitID
INSERT PermitStatus (
PermitID,
UpdatedOn,
Status
)
SELECT
P.PermitID,
GetUTCDate(),
#Status
FROM #Permits P
WHERE NOT EXISTS (
SELECT 1
FROM PermitStatus S
WHERE P.PermitID = S.PermitID
)
COMMIT TRAN
RETURN ##ERROR;
Note: XACT_ABORT helps guarantee the explicit transaction is closed following a timeout or unexpected error.
To confirm that this handles the locking problem, open several query windows and execute an identical batch like so:
WAITFOR TIME '11:00:00' -- use a time in the near future
EXEC dbo.PermitStatusUpdate #PermitIDs = '123,124,125,126', 1
All of these different sessions will execute the stored procedure in nearly the same instant. Check each session for errors. If none exist, try the same test a few times more (since it's possible to not always have the race condition occur, especially with MERGE).
The writeups at the links I gave above give even more detail than I did here, and also describe what to do for the SQL 2008 MERGE statement as well. Please read those thoroughly to truly understand the issue.
Briefly, with MERGE, no explicit transaction is needed, but you do need to use SET XACT_ABORT ON and use a locking hint:
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Table WITH (HOLDLOCK) AS TableAlias
...
This will prevent concurrency race conditions causing errors.
I also recommend that you do error handling after each data modification statement.
If you're using SQL Server 2008, you can use table valued parameters - you pass in a table of records into a stored procedure and then you can do a MERGE.
Passing in a table valued parameter would remove the need to parse CSV strings.
Edit:
ErikE has raised the point about race conditions, please refer to his answer and linked articles.
If you have SQL Server 2008, you can use MERGE. Here's an article describing this.
You should be able to do your insert and your update as two set based queries.
The code below was based on a data load procedure that I wrote a while ago that took data from a staging table and inserted or updated it into the main table.
I've tried to make it match your example, but you may need to tweak this (and create a table valued UDF to parse your CSV into a table of ids).
-- Update where the join on permitstatus matches
Update
PermitStatus
Set
[UpdatedOn]=GETUTCDATE(),
[Status]=staging.Status
From
PermitStatus status
Join
StagingTable staging
On
staging.PermitId = status.PermitId
-- Insert the new records, based on the Where Not Exists
Insert
PermitStatus(Updatedon, Status, PermitId)
Select (GETUTCDATE(), staging.status, staging.permitId
From
StagingTable staging
Where Not Exists
(
Select 1 from PermitStatus status
Where status.PermitId = staging.PermidId
)
Essentially you have an upsert stored procedure (eg. UpsertSinglePermit)
(like the code you have given above) for dealing with one row.
So the steps I see are to create a new stored procedure (UpsertNPermits) which does
a) Parse input string into n record entries (each record contains permit id and status)
b) Foreach entry in above, invoke UpsertSinglePermit
Related
I'm not sure if this is something I should do in T-SQL or not, and I'm pretty sure using the word 'iterate' was wrong in this context, since you should never iterate anything in sql. It should be a set based operation, correct? Anyway, here's the scenario:
I have a stored proc that returns many uniqueidentifiers (single column results). These ids are the primary keys of records in a another table. I need to set a flag on all the corresponding records in that table.
How do I do this without the use of cursors? Should be an easy one for you sql gurus!
This may not be the most efficient, but I would create a temp table to hold the results of the stored proc and then use that in a join against the target table. For example:
CREATE TABLE #t (uniqueid int)
INSERT INTO #t EXEC p_YourStoredProc
UPDATE TargetTable
SET a.FlagColumn = 1
FROM TargetTable a JOIN #t b
ON a.uniqueid = b.uniqueid
DROP TABLE #t
You could also change your stored proc to a user-defined function that returns a table with your uniqueidentifiers. You can joing directly to the UDF and treat it like a table which avoids having to create the extra temp table explicitly. Also, you can pass parameters into the function as you're calling it, making this a very flexible solution.
CREATE FUNCTION dbo.udfGetUniqueIDs
()
RETURNS TABLE
AS
RETURN
(
SELECT uniqueid FROM dbo.SomeWhere
)
GO
UPDATE dbo.TargetTable
SET a.FlagColumn = 1
FROM dbo.TargetTable a INNER JOIN dbo.udfGetUniqueIDs() b
ON a.uniqueid = b.uniqueid
Edit:
This will work on SQL Server 2000 and up...
Insert the results of the stored proc into a temporary table and join this to the table you want to update:
INSERT INTO #WorkTable
EXEC usp_WorkResults
UPDATE DataTable
SET Flag = Whatever
FROM DataTable
INNER JOIN #WorkTable
ON DataTable.Ket = #WorkTable.Key
If you upgrade to SQL 2008 then you can pass table parameters I believe. Otherwise, you're stuck with a global temporary table or creating a permanent table that includes a column for some sort of process ID to identify which call to the stored procedure is relevant.
How much room do you have in changing the stored procedure that generates the IDs? You could add code in there to handle it or have a parameter that lets you optionally flag the rows when it is called.
Use temporary tables or a table variable (you are using SS2005).
Although, that's not nest-able - if a stored proc uses that method then you can't dumpt that output into a temp table.
An ugly solution would be to have your procedure return the "next" id each time it is called by using the other table (or some flag on the existing table) to filter out the rows that it has already returned
You can use a temp table or table variable with an additional column:
DECLARE #MyTable TABLE (
Column1 uniqueidentifer,
...,
Checked bit
)
INSERT INTO #MyTable
SELECT [...], 0 FROM MyTable WHERE [...]
DECLARE #Continue bit
SET #Continue = 1
WHILE (#Continue)
BEGIN
SELECT #var1 = Column1,
#var2 = Column2,
...
FROM #MyTable
WHERE Checked = 1
IF #var1 IS NULL
SET #Continue = 0
ELSE
BEGIN
...
UPDATE #MyTable SET Checked = 1 WHERE Column1 = #var1
END
END
Edit: Actually, in your situation a join will be better; the code above is a cursorless iteration, which is overkill for your situation.
For a sync process, my SQL Server database should record a list items that have changed - table name and primary key.
The DB already has a table and stored procedure to do this:
EXEC #ErrCode = dbo.SyncQueueItem "tableName", 1234;
I'd like to add triggers to a table to call this stored procedure on INSERT, UPDATE, DELETE. How do I get the key? What's the simplest thing that could possibly work?
CREATE TABLE new_employees
(
id_num INT IDENTITY(1,1),
fname VARCHAR(20),
minit CHAR(1),
lname VARCHAR(30)
);
GO
IF OBJECT_ID ('dbo.sync_new_employees','TR') IS NOT NULL
DROP TRIGGER sync_new_employees;
GO
CREATE TRIGGER sync_new_employees
ON new_employees
AFTER INSERT, UPDATE, DELETE
AS
DECLARE #Key Int;
DECLARE #ErrCode Int;
-- How to get the key???
SELECT #Key = 12345;
EXEC #ErrCode = dbo.SyncQueueItem "new_employees", #key;
GO
The way to access the records changed by the operation is by using the Inserted and Deleted pseudo-tables that are provided to you by SQL Server.
Inserted contains any inserted records, or any updated records with their new values.
Deleted contains any deleted records, or any updated records with their old values.
More Info
When writing a trigger, to be safe, one should always code for the case when multiple records are acted upon. Unfortunately if you need to call a SP that means a loop - which isn't ideal.
The following code shows how this could be done for your example, and includes a method of detecting whether the operation is an Insert/Update/Delete.
declare #Key int, #ErrCode int, #Action varchar(6);
declare #Keys table (id int, [Action] varchar(6));
insert into #Keys (id, [Action])
select coalesce(I.id, D.id_num)
, case when I.id is not null and D.id is not null then 'Update' when I.id is not null then 'Insert' else 'Delete' end
from Inserted I
full join Deleted D on I.id_num = D.id_num;
while exists (select 1 from #Keys) begin
select top 1 #Key = id, #Action = [Action] from #Keys;
exec #ErrCode = dbo.SyncQueueItem 'new_employees', #key;
delete from #Keys where id = #Key;
end
Further: In addition to solving your specified problem its worth noting a couple of points regarding the bigger picture.
As #Damien_The_Unbeliever points out there are built in mechanisms to accomplish change tracking which will perform much better.
If you still wish to handle your own change tracking, it would perform better if you could arrange it such that you handle the entire recordset in one go as opposed to carrying out a row-by-row operation. There are 2 ways to accomplish this a) Move your change tracking code inside the trigger and don't use a SP. b) Use a "User Defined Table Type" to pass the record-set of changes to the SP.
You should use the Magic Table to get the data.
Usually, inserted and deleted tables are called Magic Tables in the context of a trigger. There are Inserted and Deleted magic tables in SQL Server. These tables are automatically created and managed by SQL Server internally to hold recently inserted, deleted and updated values during DML operations (Insert, Update and Delete) on a database table.
Inserted magic table
The Inserted table holds the recently inserted values, in other words, new data values. Hence recently added records are inserted into the Inserted table.
Deleted magic table
The Deleted table holds the recently deleted or updated values, in other words, old data values. Hence the old updated and deleted records are inserted into the Deleted table.
**You can use the inserted and deleted magic table to get the value of id_num **
SELECT top 1 #Key = id_num from inserted
Note: This code sample will only work for a single record for insert scenario. For Bulk insert/update scenarios you need to fetch records from inserted and deleted table stored in the temp table or variable and then loop through it to pass to your procedure or you can pass a table variable to your procedure and handle the multiple records there.
A DML trigger should operate set data else only one row will be processed. It can be something like this. And of course use magic tables inserted and deleted.
CREATE TRIGGER dbo.tr_employees
ON dbo.employees --the table from Northwind database
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare #tbl table (id int identity(1,1),delId int,insId int)
--Use "magic tables" inserted and deleted
insert #tbl(delId, insId)
select d.EmployeeID, i.EmployeeID
from inserted i --empty when "delete"
full join deleted d --empty when "insert"
on i.EmployeeID=d.EmployeeID
declare #id int,#key int,#action char
select top 1 #id=id, #key=isnull(delId, insId),
#action=case
when delId is null then 'I'
when insId is null then 'D'
else 'U' end --just in case you need the operation executed
from #tbl
--do something for each row
while #id is not null --instead of cursor
begin
--do the main action
--exec dbo.sync 'employees', #key, #action
--remove processed row
delete #tbl where id=#id
--refill #variables
select top 1 #id=id, #key=isnull(delId, insId),
#action=case
when delId is null then 'I'
when insId is null then 'D'
else 'U' end --just in case you need the operation executed
from #tbl
end
END
Not the best solution, but just a direct answer on the question:
SELECT #Key = COALESCE(deleted.id_num,inserted.id_num);
Also not the best way (if not the worst) (do not try this at home), but at least it will help with multiple values:
DECLARE #Key INT;
DECLARE triggerCursor CURSOR LOCAL FAST_FORWARD READ_ONLY
FOR SELECT COALESCE(i.id_num,d.id_num) AS [id_num]
FROM inserted i
FULL JOIN deleted d ON d.id_num = i.id_num
WHERE (
COALESCE(i.fname,'')<>COALESCE(d.fname,'')
OR COALESCE(i.minit,'')<>COALESCE(d.minit,'')
OR COALESCE(i.lname,'')<>COALESCE(d.lname,'')
)
;
OPEN triggerCursor;
FETCH NEXT FROM triggerCursor INTO #Key;
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC #ErrCode = dbo.SyncQueueItem 'new_employees', #key;
FETCH NEXT FROM triggerCursor INTO #Key;
END
CLOSE triggerCursor;
DEALLOCATE triggerCursor;
Better way to use trigger based "value-change-tracker":
INSERT INTO [YourTableHistoryName] (id_num, fname, minit, lname, WhenHappened)
SELECT COALESCE(i.id_num,d.id_num) AS [id_num]
,i.fname,i.minit,i.lname,CURRENT_TIMESTAMP AS [WhenHeppened]
FROM inserted i
FULL JOIN deleted d ON d.id_num = i.id_num
WHERE ( COALESCE(i.fname,'')<>COALESCE(d.fname,'')
OR COALESCE(i.minit,'')<>COALESCE(d.minit,'')
OR COALESCE(i.lname,'')<>COALESCE(d.lname,'')
)
;
The best (in my opinion) way to track changes is to use Temporal tables (SQL Server 2016+)
inserted/deleted in triggers will generate as many rows as touched and calling a stored proc per key would require a cursor or similar approach per row.
You should check timestamp/rowversion in SQL Server. You could add that to the all tables in question (not null, auto increment, unique within database for each table/row etc).
You could add a unique index on that column to all tables you added the column.
##DBTS is the current timestamp, you can store today's ##DBTS and tomorrow you will scan all tables from that to current ##DBTS. timestamp/rowversion will be incremented for all updates and inserts but for deletes it won't track, for deletes you can have a delete only trigger and insert keys into a different table.
Change data capture or change tracking could do this easier, but if there is heavy volumes on the server or large number of data loads, partition switches scanning the transaction log becomes a bottleneck and in some cases you will have to remove change data capture to save the transaction log from growing indefinetely.
I have table in which the data is been continuously added at a rapid pace.
And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.
I am using the below query to
BEGIN TRAN readrowdata
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
WITH q AS
(
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
)
DELETE from q
COMMIT TRANSACTION readrowdata
I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.
Is the query is ok in performance term? If i can improve it then please provide me your suggestion.
Thank you
Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...
DECLARE #rowsBeingProcessed TABLE (
id INT
);
WITH rows AS (
SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO #rowsBeingProcessed;
-- Business Logic Here
DELETE yourTable WHERE RowDataID IN (SELECT id FROM #rowsBeingProcessed);
Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.
By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.
If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).
Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!
You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...
Data Definition language
CREATE TABLE SampleTable
(
id int
)
Sample Record
insert into SampleTable(id)Values(1)
Sample Trigger
CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
Set NOCOUNT ON
SET XACT_ABORT ON
Begin Try
Begin Tran
Select ID From Inserted
DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);
Commit Tran
End Try
Begin Catch
Rollback Tran
End Catch
End
Hope this is very simple and helpful
If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?
If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?
I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:
DECLARE #Output table
(
StaffID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO #Output
FROM Sales.vSalesPerson sp
INNER JOIN dbo.SalesStaff ss
ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM #output;
Maybe it will be helpfull for you.
I've got a simple queue implementation in MS Sql Server 2008 R2. Here's the essense of the queue:
CREATE TABLE ToBeProcessed
(
Id BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Priority] INT DEFAULT(100) NOT NULL,
IsBeingProcessed BIT default (0) NOT NULL,
SomeData nvarchar(MAX) NOT null
)
I want to atomically select the top n rows ordered by the priority and the id where IsBeingProcessed is false and update those rows to say they are being processed. I thought I'd use a combination of Update, Top, Output and Order By but unfortunately you can't use top and order by in an Update statement.
So I've made an in clause to restrict the update and that sub query does the order by (see below). My question is, is this whole statement atomic, or do I need to wrap it in a transaction?
DECLARE #numberToProcess INT = 2
CREATE TABLE #IdsToProcess
(
Id BIGINT NOT null
)
UPDATE
ToBeProcessed
SET
ToBeProcessed.IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
WHERE
ToBeProcessed.Id IN
(
SELECT TOP(#numberToProcess)
ToBeProcessed.Id
FROM
ToBeProcessed
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC)
SELECT
*
FROM
#IdsToProcess
DROP TABLE #IdsToProcess
Here's some sql to insert some dummy rows:
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?
In that case I'd use this approach.
;WITH cte As
(
SELECT TOP(#numberToProcess)
*
FROM
ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST)
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC
)
UPDATE
cte
SET
IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
I was a bit uncertain earlier whether SQL Server would take U locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N rows. This does not appear to be the case.
Test Table
CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)
INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2
Test Script (Run in 2 concurrent SSMS sessions)
BEGIN TRY
DECLARE #FinishedMessage VARBINARY (128) = CAST('TestFinished' AS VARBINARY (128))
DECLARE #SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS VARBINARY (128))
SET CONTEXT_INFO #SynchMessage
DECLARE #OtherSpid int
WHILE(#OtherSpid IS NULL)
SELECT #OtherSpid=spid
FROM sys.sysprocesses
WHERE context_info=#SynchMessage and spid<>##SPID
SELECT #OtherSpid
DECLARE #increment INT = ##spid
DECLARE #number INT = #increment
WHILE (#number = #increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=#FinishedMessage))
UPDATE JobsToProcess
SET #number=number +=#increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority
FROM JobsToProcess
WHERE isprocessed=0
ORDER BY priority DESC)
SELECT *
FROM JobsToProcess
WHERE number not in (0,#OtherSpid,##spid)
SET CONTEXT_INFO #FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO #FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH
Almost immediately execution stops as both concurrent transactions update the same row so the S locks taken whilst identifying the TOP 1 priority must get released before it aquires a U lock then the 2 transactions proceed to get the row U and X lock in sequence.
If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority) then deadlock occurs almost immediately instead as in this case the row S lock doesn't get released, one transaction aquires a U lock on the row and waits to convert it to an X lock and the other transaction is still waiting to convert its S lock to a U lock.
If the query above is changed to use MIN rather than TOP
WHERE priority = (SELECT MIN(priority)
FROM JobsToProcess
WHERE isprocessed=0
)
Then SQL Server manages to completely eliminate the sub query from the plan and takes U locks all the way.
Every individual T-SQL statement is, according to all my experience and all the documenation I've ever read, supposed to be atomic. What you have there is a single T-SQL statement, ergo is should be atomic and will not require explicit transaction statements. I've used this precise kind of logic many times, and never had a problem with it. I look forward to seeing if anyone as a supportable alternate opinion.
Incidentally, look into the ranking functions, specifically row_number(), for retrieving a set number of items. The syntax is perhaps a tad awkward, but overall they are flexible and powerful tools. (There are about a bazillion Stack Overlow questions and answers that discuss them.)
I've got a stored procedure, which selects 1 record back. the stored procedure could be called from several different applications on different PCs. The idea is that the stored procedure brings back the next record that needs to be processed, and if two applications call the stored proc at the same time, the same record should not be brought back. My query is below, I'm trying to write the query as efficiently as possible (sql 2008). Can it get done more efficiently than this?
CREATE PROCEDURE GetNextUnprocessedRecord
AS
BEGIN
SET NOCOUNT ON;
--ID of record we want to select back
DECLARE #iID BIGINT
-- Find the next processable record, and mark it as dispatched
-- Must be done in a transaction to ensure no other query can get
-- this record between the read and update
BEGIN TRAN
SELECT TOP 1
#iID = [ID]
FROM
--Don't read locked records, only lock the specific record
[MyRecords] WITH (READPAST, ROWLOCK)
WHERE
[Dispatched] is null
ORDER BY
[Received]
--Mark record as picked up for processing
UPDATE
[MyRecords]
SET
[Dispatched] = GETDATE()
WHERE
[ID] = #iID
COMMIT TRAN
--Select back the specific record
SELECT
[ID],
[Data]
FROM
[MyRecords] WITH (NOLOCK, READPAST)
WHERE
[ID] = #iID
END
Using the READPAST locking hint is correct and your SQL looks OK.
I'd add use XLOCK though which is also HOLDLOCK/SERIALIZABLE
...
[MyRecords] WITH (READPAST, ROWLOCK, XLOCK)
...
This means you get the ID, and exclusively lock that row while you carry on and update it.
Edit: add an index on Dispatched and Received columns to make it quicker. If [ID] (I assume it's the PK) is not clustered, INCLUDE [ID]. And filter the index too because it's SQL 2008
You could also use this construct which does it all in one go without XLOCK or HOLDLOCK
UPDATE
MyRecords
SET
--record the row ID
#id = [ID],
--flag doing stuff
[Dispatched] = GETDATE()
WHERE
[ID] = (SELECT TOP 1 [ID] FROM MyRecords WITH (ROWLOCK, READPAST) WHERE Dispatched IS NULL ORDER BY Received)
UPDATE, assign, set in one
You can assign each picker process a unique id, and add columns pickerproc and pickstate to your records. Then
UPDATE MyRecords
SET pickerproc = myproc,
pickstate = 'I' -- for 'I'n process
WHERE Id = (SELECT MAX(Id) FROM MyRecords WHERE pickstate = 'A') -- 'A'vailable
That gets you your record in one atomic step, and you can do the rest of your processing at your leisure. Then you can set pickstate to 'C'omplete', 'E'rror, or whatever when it's resolved.
I think Mitch is referring to another good technique where you create a message-queue table and insert the Ids there. There are several SO threads - search for 'message queue table'.
You can keep MyRecords on a "MEMORY" table for faster processing.