Efficient transaction, record locking - sql-server

I've got a stored procedure, which selects 1 record back. the stored procedure could be called from several different applications on different PCs. The idea is that the stored procedure brings back the next record that needs to be processed, and if two applications call the stored proc at the same time, the same record should not be brought back. My query is below, I'm trying to write the query as efficiently as possible (sql 2008). Can it get done more efficiently than this?
CREATE PROCEDURE GetNextUnprocessedRecord
AS
BEGIN
SET NOCOUNT ON;
--ID of record we want to select back
DECLARE #iID BIGINT
-- Find the next processable record, and mark it as dispatched
-- Must be done in a transaction to ensure no other query can get
-- this record between the read and update
BEGIN TRAN
SELECT TOP 1
#iID = [ID]
FROM
--Don't read locked records, only lock the specific record
[MyRecords] WITH (READPAST, ROWLOCK)
WHERE
[Dispatched] is null
ORDER BY
[Received]
--Mark record as picked up for processing
UPDATE
[MyRecords]
SET
[Dispatched] = GETDATE()
WHERE
[ID] = #iID
COMMIT TRAN
--Select back the specific record
SELECT
[ID],
[Data]
FROM
[MyRecords] WITH (NOLOCK, READPAST)
WHERE
[ID] = #iID
END

Using the READPAST locking hint is correct and your SQL looks OK.
I'd add use XLOCK though which is also HOLDLOCK/SERIALIZABLE
...
[MyRecords] WITH (READPAST, ROWLOCK, XLOCK)
...
This means you get the ID, and exclusively lock that row while you carry on and update it.
Edit: add an index on Dispatched and Received columns to make it quicker. If [ID] (I assume it's the PK) is not clustered, INCLUDE [ID]. And filter the index too because it's SQL 2008
You could also use this construct which does it all in one go without XLOCK or HOLDLOCK
UPDATE
MyRecords
SET
--record the row ID
#id = [ID],
--flag doing stuff
[Dispatched] = GETDATE()
WHERE
[ID] = (SELECT TOP 1 [ID] FROM MyRecords WITH (ROWLOCK, READPAST) WHERE Dispatched IS NULL ORDER BY Received)
UPDATE, assign, set in one

You can assign each picker process a unique id, and add columns pickerproc and pickstate to your records. Then
UPDATE MyRecords
SET pickerproc = myproc,
pickstate = 'I' -- for 'I'n process
WHERE Id = (SELECT MAX(Id) FROM MyRecords WHERE pickstate = 'A') -- 'A'vailable
That gets you your record in one atomic step, and you can do the rest of your processing at your leisure. Then you can set pickstate to 'C'omplete', 'E'rror, or whatever when it's resolved.
I think Mitch is referring to another good technique where you create a message-queue table and insert the Ids there. There are several SO threads - search for 'message queue table'.

You can keep MyRecords on a "MEMORY" table for faster processing.

Related

Optimising concurrency solution for Select-then-update pattern using WITH (READPAST, ROWLOCK, XLOCK)

Assume I need to write a ticket sales system. A number of tickets are put in the pool for sale. When an order is placed I update the ticket record to mark that the ticket is bound to the order. The ticket-order relationship table is as follow. 3 tickets are put in the pool for testing.
IF OBJECT_ID (N'Demo_TicketOrder', N'U') IS NOT NULL DROP TABLE [Demo_TicketOrder];
CREATE TABLE [dbo].[Demo_TicketOrder] (
[TicketId] int NOT NULL,
[OrderId] int NULL
INDEX IX_OrderId_TicketId (OrderId, TicketId),
);
INSERT INTO Demo_TicketOrder VALUES (1, NULL)
INSERT INTO Demo_TicketOrder VALUES (2, NULL)
INSERT INTO Demo_TicketOrder VALUES (3, NULL)
SELECT * FROM Demo_TicketOrder
Below is the script I wrote that will be invoked by an ASP.NET app. The #OrderId will be passed as parameter form the App. For testing purpose I hard-coded it to 1. I have another window opened with #OrderId set as 2. Now I can simulate the concurrency of 2 requests..
DECLARE #OrderId AS INT = 1
BEGIN TRANSACTION PlaceOrder
BEGIN TRY
DECLARE #ticketId AS INT;
SELECT TOP 1 #ticketId = TicketId FROM Demo_TicketOrder WITH (READPAST, ROWLOCK, XLOCK) WHERE [OrderId] is NULL ORDER BY TicketId;
IF ##ROWCOUNT != 1 THROW 50001, 'No tickets left!', 1;
WAITFOR DELAY '00:00:5'; -- Simulate some delay that incurrs concurrent requests
UPDATE Demo_TicketOrder WITH (ROWLOCK) SET [OrderId] = #OrderId WHERE [OrderId] IS NULL AND [TicketId] = #ticketId AND NOT EXISTS (SELECT 1 FROM Demo_TicketOrder WHERE OrderId = #OrderId );
IF ##ROWCOUNT != 1
BEGIN
DECLARE #ErrorMessage AS NVARCHAR(MAX) = CONCAT('Optimistic lock activated! TicketId=', CAST(#ticketId AS VARCHAR(20)));
THROW 50002, #ErrorMessage, 2;
END
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION PlaceOrder;
THROW
END CATCH;
COMMIT TRANSACTION PlaceOrder;
SELECT * FROM Demo_TicketOrder WHERE [TicketId] = #ticketId;
My target is for this piece of code to
Handle concurrent requests efficiently
That's why I can't just simply do SELECT then UPDATE WHERE OrderId IS NULL because a lot of requests will fail when request volume boosts.
Do not allow two orders to be bound to one ticket.
By using ROWLOCK, XLOCK in the SELECT I assume every requests will get an empty ticket. Also there's still an optimistic compare-and-update mechanism in the UPDATE statement as a safety net should the lock fails.
While a request is processing, do not block new coming requests.
By using READPAST I expect all new requests will get the next available ticket immediately without waiting for 1st request to finish COMMIT.
In the off chance that two requests with the same OrderId comes, make sure only one is served
By the NOT EXISTS condition of the UPDATE statement I assume this done.
Why ask this question:
I came up with this solution with my own because I did not find a mature pattern after extensive searching. But I think this kind of problem is quite common which got me worried that I may be over-complicating things or having left something unconsidered, as I'm new to T-SQL (always been using EF6). What worries me more is I never even see XLOCK being used online except for suggestions against it. Days has went into testing this piece of code and so far it seems OK but I just want to make sure.
QUESTION A.
Does this code cover my targets? Could it be simplified (without using queueing middle ware on the app level - that'd be another thing)?
QUESTION B.
While testing I found compound index INDEX IX_OrderId_TicketId (OrderId, TicketId) to be necessary. I can't understand why if I leave out the OrderId (having only IX_TicketId), I'll - 100% replicable - get a deadlock on the second request.
It seems to me this is unduly complex for the need. Consider a unique filtered index on OrderId to ensure the order is assigned to only 1 ticket. I expect a default pessimistic concurrency technique would provide adequate throughput (> 1K per second) without resorting to READPAST:
IF OBJECT_ID (N'Demo_TicketOrder', N'U') IS NOT NULL
DROP TABLE [Demo_TicketOrder];
CREATE TABLE dbo.Demo_TicketOrder (
TicketId int NOT NULL
CONSTRAINT PK_Demo_TicketOrder PRIMARY KEY NONCLUSTERED
, OrderId int NULL
);
CREATE CLUSTERED INDEX Demo_TicketOrder_OrderId ON Demo_TicketOrder(OrderId);
CREATE UNIQUE INDEX Demo_TicketOrder_OrderId_NotNull ON Demo_TicketOrder(OrderId) WHERE OrderId IS NOT NULL;
GO
CREATE OR ALTER PROC dbo.usp_UpdateTicket
#OrderID int
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
UPDATE TOP(1) dbo.Demo_TicketOrder
SET OrderId = #OrderId
WHERE OrderID IS NULL;
IF ##ROWCOUNT = 0 THROW 50001, 'No tickets left!', 1;
GO
Regarding the deadlock without OrderId as the first column, the subquery in the UPDATE is by OrderId so the table must be scanned without a supporting index. The scan is blocked when it encounters the row locked by the other session. The other session is similarly blocked when it tries to execute the update, resulting in the deadlock.
EDIT:
The order of assigned tickets is undefined with the above UPDATE TOP(1) method. There is no provision for ORDER BY with this syntax but that doesn't matter if the tickets are homogeneous.
If you have a requirement to assign orders to tickets in TicketId sequence, you could use a CTE or similar technique along with an UPDLOCK hint (to avoid deadlocking) and add TicketId to the clustered index key (to efficiently find the lowest unassigned TicketId.
CREATE CLUSTERED INDEX idx_Demo_TicketOrder_OrderId_TicketId ON Demo_TicketOrder(OrderId, TicketId);
GO
CREATE OR ALTER PROC dbo.usp_UpdateTicketV2
#OrderID int
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
WITH next_available_ticket AS (
SELECT TOP(1)
TicketID
, OrderId
FROM dbo.Demo_TicketOrder AS t WITH(UPDLOCK)
WHERE t.OrderId IS NULL
ORDER BY t.TicketId
)
UPDATE next_available_ticket
SET OrderId = #OrderId;
IF ##ROWCOUNT = 0 THROW 50001, 'No tickets left!', 1;
GO

Trigger AFTER INSERT, UPDATE, DELETE to call stored procedure with table name and primary key

For a sync process, my SQL Server database should record a list items that have changed - table name and primary key.
The DB already has a table and stored procedure to do this:
EXEC #ErrCode = dbo.SyncQueueItem "tableName", 1234;
I'd like to add triggers to a table to call this stored procedure on INSERT, UPDATE, DELETE. How do I get the key? What's the simplest thing that could possibly work?
CREATE TABLE new_employees
(
id_num INT IDENTITY(1,1),
fname VARCHAR(20),
minit CHAR(1),
lname VARCHAR(30)
);
GO
IF OBJECT_ID ('dbo.sync_new_employees','TR') IS NOT NULL
DROP TRIGGER sync_new_employees;
GO
CREATE TRIGGER sync_new_employees
ON new_employees
AFTER INSERT, UPDATE, DELETE
AS
DECLARE #Key Int;
DECLARE #ErrCode Int;
-- How to get the key???
SELECT #Key = 12345;
EXEC #ErrCode = dbo.SyncQueueItem "new_employees", #key;
GO
The way to access the records changed by the operation is by using the Inserted and Deleted pseudo-tables that are provided to you by SQL Server.
Inserted contains any inserted records, or any updated records with their new values.
Deleted contains any deleted records, or any updated records with their old values.
More Info
When writing a trigger, to be safe, one should always code for the case when multiple records are acted upon. Unfortunately if you need to call a SP that means a loop - which isn't ideal.
The following code shows how this could be done for your example, and includes a method of detecting whether the operation is an Insert/Update/Delete.
declare #Key int, #ErrCode int, #Action varchar(6);
declare #Keys table (id int, [Action] varchar(6));
insert into #Keys (id, [Action])
select coalesce(I.id, D.id_num)
, case when I.id is not null and D.id is not null then 'Update' when I.id is not null then 'Insert' else 'Delete' end
from Inserted I
full join Deleted D on I.id_num = D.id_num;
while exists (select 1 from #Keys) begin
select top 1 #Key = id, #Action = [Action] from #Keys;
exec #ErrCode = dbo.SyncQueueItem 'new_employees', #key;
delete from #Keys where id = #Key;
end
Further: In addition to solving your specified problem its worth noting a couple of points regarding the bigger picture.
As #Damien_The_Unbeliever points out there are built in mechanisms to accomplish change tracking which will perform much better.
If you still wish to handle your own change tracking, it would perform better if you could arrange it such that you handle the entire recordset in one go as opposed to carrying out a row-by-row operation. There are 2 ways to accomplish this a) Move your change tracking code inside the trigger and don't use a SP. b) Use a "User Defined Table Type" to pass the record-set of changes to the SP.
You should use the Magic Table to get the data.
Usually, inserted and deleted tables are called Magic Tables in the context of a trigger. There are Inserted and Deleted magic tables in SQL Server. These tables are automatically created and managed by SQL Server internally to hold recently inserted, deleted and updated values during DML operations (Insert, Update and Delete) on a database table.
Inserted magic table
The Inserted table holds the recently inserted values, in other words, new data values. Hence recently added records are inserted into the Inserted table.
Deleted magic table
The Deleted table holds the recently deleted or updated values, in other words, old data values. Hence the old updated and deleted records are inserted into the Deleted table.
**You can use the inserted and deleted magic table to get the value of id_num **
SELECT top 1 #Key = id_num from inserted
Note: This code sample will only work for a single record for insert scenario. For Bulk insert/update scenarios you need to fetch records from inserted and deleted table stored in the temp table or variable and then loop through it to pass to your procedure or you can pass a table variable to your procedure and handle the multiple records there.
A DML trigger should operate set data else only one row will be processed. It can be something like this. And of course use magic tables inserted and deleted.
CREATE TRIGGER dbo.tr_employees
ON dbo.employees --the table from Northwind database
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare #tbl table (id int identity(1,1),delId int,insId int)
--Use "magic tables" inserted and deleted
insert #tbl(delId, insId)
select d.EmployeeID, i.EmployeeID
from inserted i --empty when "delete"
full join deleted d --empty when "insert"
on i.EmployeeID=d.EmployeeID
declare #id int,#key int,#action char
select top 1 #id=id, #key=isnull(delId, insId),
#action=case
when delId is null then 'I'
when insId is null then 'D'
else 'U' end --just in case you need the operation executed
from #tbl
--do something for each row
while #id is not null --instead of cursor
begin
--do the main action
--exec dbo.sync 'employees', #key, #action
--remove processed row
delete #tbl where id=#id
--refill #variables
select top 1 #id=id, #key=isnull(delId, insId),
#action=case
when delId is null then 'I'
when insId is null then 'D'
else 'U' end --just in case you need the operation executed
from #tbl
end
END
Not the best solution, but just a direct answer on the question:
SELECT #Key = COALESCE(deleted.id_num,inserted.id_num);
Also not the best way (if not the worst) (do not try this at home), but at least it will help with multiple values:
DECLARE #Key INT;
DECLARE triggerCursor CURSOR LOCAL FAST_FORWARD READ_ONLY
FOR SELECT COALESCE(i.id_num,d.id_num) AS [id_num]
FROM inserted i
FULL JOIN deleted d ON d.id_num = i.id_num
WHERE (
COALESCE(i.fname,'')<>COALESCE(d.fname,'')
OR COALESCE(i.minit,'')<>COALESCE(d.minit,'')
OR COALESCE(i.lname,'')<>COALESCE(d.lname,'')
)
;
OPEN triggerCursor;
FETCH NEXT FROM triggerCursor INTO #Key;
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC #ErrCode = dbo.SyncQueueItem 'new_employees', #key;
FETCH NEXT FROM triggerCursor INTO #Key;
END
CLOSE triggerCursor;
DEALLOCATE triggerCursor;
Better way to use trigger based "value-change-tracker":
INSERT INTO [YourTableHistoryName] (id_num, fname, minit, lname, WhenHappened)
SELECT COALESCE(i.id_num,d.id_num) AS [id_num]
,i.fname,i.minit,i.lname,CURRENT_TIMESTAMP AS [WhenHeppened]
FROM inserted i
FULL JOIN deleted d ON d.id_num = i.id_num
WHERE ( COALESCE(i.fname,'')<>COALESCE(d.fname,'')
OR COALESCE(i.minit,'')<>COALESCE(d.minit,'')
OR COALESCE(i.lname,'')<>COALESCE(d.lname,'')
)
;
The best (in my opinion) way to track changes is to use Temporal tables (SQL Server 2016+)
inserted/deleted in triggers will generate as many rows as touched and calling a stored proc per key would require a cursor or similar approach per row.
You should check timestamp/rowversion in SQL Server. You could add that to the all tables in question (not null, auto increment, unique within database for each table/row etc).
You could add a unique index on that column to all tables you added the column.
##DBTS is the current timestamp, you can store today's ##DBTS and tomorrow you will scan all tables from that to current ##DBTS. timestamp/rowversion will be incremented for all updates and inserts but for deletes it won't track, for deletes you can have a delete only trigger and insert keys into a different table.
Change data capture or change tracking could do this easier, but if there is heavy volumes on the server or large number of data loads, partition switches scanning the transaction log becomes a bottleneck and in some cases you will have to remove change data capture to save the transaction log from growing indefinetely.

SQL Server custom counter stored procedure creating dupes

I created a stored procedure to implement rate limiting on my API, this is called about 5-10k times a second and each day I'm noticing dupes in the counter table.
It looks up the API key being passed in and then checks the counter table with the ID and date combination using an "UPSERT" and if it finds a result it does an UPDATE [count]+1 and if not it will INSERT a new row.
There is no primary key in the counter table.
Here is the stored procedure:
USE [omdb]
GO
/****** Object: StoredProcedure [dbo].[CheckKey] Script Date: 6/17/2017 10:39:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[CheckKey] (
#apikey AS VARCHAR(10)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #userID as int
DECLARE #limit as int
DECLARE #curCount as int
DECLARE #curDate as Date = GETDATE()
SELECT #userID = id, #limit = limit FROM [users] WHERE apiKey = #apikey
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason]
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert
MERGE [counter] AS t
USING (SELECT #userID AS ID) AS s
ON t.[ID] = s.[ID] AND t.[date] = #curDate
WHEN MATCHED THEN UPDATE SET t.[count] = t.[count]+1
WHEN NOT MATCHED THEN INSERT ([ID], [date], [count]) VALUES (#userID, #curDate, 1);
COMMIT TRANSACTION Upsert
SELECT #curCount = [count] FROM [counter] WHERE ID = #userID AND [date] = #curDate
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END
END
END
I also think some locks are happening after introducing this SP.
The dupes aren't breaking anything, but I'm curious if it's something fundamentally wrong with my code or if I should setup a constraint in the table to prevent this. Thanks
Update 6/23/17: I dropped the MERGE statement and tried using ##ROWCOUNT but it also caused dupes
BEGIN TRANSACTION Upsert
UPDATE [counter] SET [count] = [count]+1 WHERE [ID] = #userID AND [date] = #curDate
IF ##ROWCOUNT = 0 AND ##ERROR = 0
INSERT INTO [counter] ([ID], [date], [count]) VALUES (#userID, #curDate, 1)
COMMIT TRANSACTION Upsert
A HOLDLOCK hint on the update statement will avoid the race condition. To prevent deadlocks, I suggest a clustered composite primary key (or unique index) on ID and date.
The example below incorporates these changes and uses the SET <variable> = <column> = <expression> form of the SET clause to avoid the need for the subsequent SELECT of the final counter value and thereby improve performance.
ALTER PROCEDURE [dbo].[CheckKey]
#apikey AS VARCHAR(10)
AS
SET NOCOUNT ON;
--SET XACT_ABORT ON is a best practice for procs with explcit transactions
SET XACT_ABORT ON;
DECLARE
#userID as int
, #limit as int
, #curCount as int
, #curDate as Date = GETDATE();
BEGIN TRY;
SELECT
#userID = id
, #limit = limit
FROM [users]
WHERE apiKey = #apikey;
IF #userID IS NULL
BEGIN
--Key not found
SELECT 'False' as [Response], 'Invalid API key!' as [Reason];
END
ELSE
BEGIN
--Key found
BEGIN TRANSACTION Upsert;
UPDATE [counter] WITH(HOLDLOCK)
SET #curCount = [count] = [count] + 1
WHERE
[ID] = #userID
AND [date] = #curDate;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO [counter] ([ID], [date], [count])
VALUES (#userID, #curDate, 1);
END;
IF #limit IS NOT NULL AND #curCount > #limit
BEGIN
SELECT 'False' as [Response], 'Request limit reached!' as [Reason]
END
ELSE
BEGIN
SELECT 'True' as [Response], NULL as [Reason]
END;
COMMIT TRANSACTION Upsert;
END;
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
GO
Probably not the answer you're looking for but for a rate-limiting counter I would use a cache like Redis in a middleware before hitting the API. Performance-wise it's pretty great since Redis would have no problem with the load and your DB won’t be impacted.
And if you want to keep a history of hits per api key per day in SQL, run a daily task to import yesterday's counts from Redis to SQL.
The data-set would be small enough to get a Redis instance that would cost literally nothing (or close).
It will be the merge statement getting into a race condition with itself, i.e. your API is getting called by the same client and both times the merge statement finds no row so inserts one. Merge isn't an atomic operation, even though it's reasonable to assume it is. For example see this bug report for SQL 2008, about merge causing deadlocks, the SQL server team said this is by design.
From your post I think the immediate issue is that your clients will be potentially getting​ a small number of free hits on your API. For example if two requests come in and see no row you'll start with two rows with a count of 1 when you'd actually want one row with a count of 2 and the client could end up getting 1 free API hit that day. If three requests crossed over you'd get three rows with a count of 1 and they could get 2 free API hits, etc.
Edit
So as your link suggests you've got two categories of options you could explore, firstly just try and get this working in SQL server, secondly other architectural solutions.
For the SQL option I would do away with the merge and consider pre-populating your clients ahead of time, either nightly or less often for several days at a time, this will leave you a single update instead of the merge/update and insert. Then you can confirm both your update and your select are fully optimised, ie have the necessary index and that they aren't causing scans. Next you could look at tweaking locking so you're only locking at the row level, see this for some more info. For the select you could also look at using NOLOCK which means you could get slightly incorrect data but this shouldn't matter in your case, you'll be using a WHERE which targets a single row always as well.
For the non-SQL options, as your link says you could look at queuing things up, obviously these would be the updates/inserts so your selects would be seeing old data. This may or may not be acceptable depending on how far apart they are although you could have this as an "eventually consistent" solution if you wanted to be strict and charge extra or take off API hits the next day or something. You could also look at caching options to store the counts, this would get more complex if your app is distributed but there are caching solutions for this. If you went with caching you could choose to not persist anything but then you'd potentially give away a load of free hits if your site went down, but you'd probably have bigger issues to worry about then anyway!
At a high level, have you considered pursuing the following scenario?
Restructuring: Set the primary key on on your table to be a composite of (ID, date). Possibly even better, just use the API Key itself instead of the arbitrary ID you're assigning it.
Query A: Do SQL Server's equivalent of "INSERT IGNORE" (it seems there are semantic equivalents for SQL Server based on a Google search) with the values (ID, TODAY(), 1). You'll also want to specify a WHERE clause that checks the ID actually exists in your API/limits table).
Query B: Update the row with (ID, TODAY()) as its primary key, setting count := count + 1, and in the very same query, do an inner join with your limits table, so that in the where clause you can specify that you'll only update the count if the count < limit.
If the majority of your requests are valid API requests or rate-limited requests, I would perform queries in the following order on every request:
Run Query B.
If 0 rows updated:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.
If 1 row updated:
continue.
If the majority of your requests are invalid API requests, I'd do the following:
Run query A.
If 0 rows updated:
Run query B.
If 0 rows updated, reject because of rate limit.
If 1 rows updated, continue.
If 1 rows updated:
continue.

Select and Delete in the same transaction using TOP clause

I have table in which the data is been continuously added at a rapid pace.
And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.
I am using the below query to
BEGIN TRAN readrowdata
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
WITH q AS
(
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
)
DELETE from q
COMMIT TRANSACTION readrowdata
I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.
Is the query is ok in performance term? If i can improve it then please provide me your suggestion.
Thank you
Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...
DECLARE #rowsBeingProcessed TABLE (
id INT
);
WITH rows AS (
SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO #rowsBeingProcessed;
-- Business Logic Here
DELETE yourTable WHERE RowDataID IN (SELECT id FROM #rowsBeingProcessed);
Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.
By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.
If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).
Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!
You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...
Data Definition language
CREATE TABLE SampleTable
(
id int
)
Sample Record
insert into SampleTable(id)Values(1)
Sample Trigger
CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
Set NOCOUNT ON
SET XACT_ABORT ON
Begin Try
Begin Tran
Select ID From Inserted
DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);
Commit Tran
End Try
Begin Catch
Rollback Tran
End Catch
End
Hope this is very simple and helpful
If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?
If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?
I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:
DECLARE #Output table
(
StaffID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO #Output
FROM Sales.vSalesPerson sp
INNER JOIN dbo.SalesStaff ss
ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM #output;
Maybe it will be helpfull for you.

SQl Server Express 2005 - updating 2 tables and atomicity?

First off, I want to start by saying I am not an SQL programmer (I'm a C++/Delphi guy), so some of my questions might be really obvious. So pardon my ignorance :o)
I've been charged with writing a script that will update certain tables in a database based on the contents of a CSV file. I have it working it would seem, but I am worried about atomicity for one of the steps:
One of the tables contains only one field - an int which must be incremented each time, but from what I can see is not defined as an identity for some reason. I must create a new row in this table, and insert that row's value into another newly-created row in another table.
This is how I did it (as part of a larger script):
DECLARE #uniqueID INT,
#counter INT,
#maxCount INT
SELECT #maxCount = COUNT(*) FROM tempTable
SET #counter = 1
WHILE (#counter <= #maxCount)
BEGIN
SELECT #uniqueID = MAX(id) FROM uniqueIDTable <----Line 1
INSERT INTO uniqueIDTableVALUES (#uniqueID + 1) <----Line 2
SELECT #uniqueID = #uniqueID + 1
UPDATE TOP(1) tempTable
SET userID = #uniqueID
WHERE userID IS NULL
SET #counter = #counter + 1
END
GO
First of all, am I correct using a "WHILE" construct? I couldn't find a way to achieve this with a simple UPDATE statement.
Second of all, how can I be sure that no other operation will be carried out on the database between Lines 1 and 2 that would insert a value into the uniqueIDTable before I do? Is there a way to "synchronize" operations in SQL Server Express?
Also, keep in mind that I have no control over the database design.
Thanks a lot!
You can do the whole 9 yards in one single statement:
WITH cteUsers AS (
SELECT t.*
, ROW_NUMBER() OVER (ORDER BY userID) as rn
, COALESCE(m.id,0) as max_id
FROM tempTable t WITH(UPDLOCK)
JOIN (
SELECT MAX(id) as id
FROM uniqueIDTable WITH (UPDLOCK)
) as m ON 1=1
WHERE userID IS NULL)
UPDATE cteUsers
SET userID = rn + max_id
OUTPUT INSERTED.userID
INTO uniqueIDTable (id);
You get the MAX(id), lock the uniqueIDTable, compute sequential userIDs for users with NULL userID by using ROW_NUMBER(), update the tempTable and insert the new ids into uniqueIDTable. All in one operation.
For performance you need and index on uniqueIDTable(id) and index on tempTable(userID).
SQL is all about set oriented operations, WHILE loops are the code smell of SQL.
You need a transaction to ensure atomicity and you need to move the select and insert into one statement or do the select with an updlock to prevent two people from running the select at the same time, getting the same value and then trying to insert the same value into the table.
Basically
DECLARE #MaxValTable TABLE (MaxID int)
BEGIN TRANSACTION
BEGIN TRY
INSERT INTO uniqueIDTable VALUES (id)
OUTPUT inserted.id INTO #MaxValTable
SELECT MAX(id) + 1 FROM uniqueIDTable
UPDATE TOP(1) tempTable
SET userID = (SELECT MAXid FROM #MaxValTable)
WHERE userID IS NULL
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
RAISERROR 'Error occurred updating tempTable' -- more detail here is good
END CATCH
That said, using an identity would make things far simpler. This is a potential concurrency problem. Is there any way you can change the column to be identity?
Edit: Ensuring that only one connection at a time will be able to insert into the uniqueIDtable. Not going to scale well though.
Edit: Table variable's better than exclusive table lock. If need be, this can be used when inserting users as well.

Resources