I've inherited the following SQL Server stored procedure (on Azure SQL Server):
PROCEDURE [Sources].[DeleteRawCaptures]
(#SourceId BIGINT,
#Count INT OUTPUT)
AS
BEGIN
SET NOCOUNT ON
SET ROWCOUNT 500
DECLARE #BatchId uniqueidentifier
SELECT #BatchId = CaptureBatch
FROM [Data].[RawCaptures]
WHERE CaptureId = (SELECT MAX(CaptureId)
FROM [Data].[RawCaptures]
WHERE SourceId = #SourceId)
DELETE FROM [Data].[RawCaptures]
WHERE SourceId = #SourceId
AND CaptureBatch <> #BatchId
SET #Count = ##ROWCOUNT
END
After many months of this working without issue, it is timing out as of late. The former developer who built the solution, in the brief moment I was able to contact him, suggested it's possible there may be an additional index needed on the Data.RawCaptures table, as the last time he ran into errors many months ago, he needed to add another index. He was not any more specific than that, unfortunately.
I'm not well-versed enough in SQL Server indexes to determine what type of index, and on what columns, I should ensure I have in place, to be certain the above stored procedure is running at it's optimal ability.
These are the indexes that are currently in place:
Just to clarify in case the title of the index does not give enough information:
The first one's fields are CaptureBatch and Status
The second one's field is CaptureDateTime
The third one's fields are SourceId and CaptureBatch
The fourth one's fields are SourceId and Status
The fifth one's field is SourceId
The sixth one's field isStatus
The seventh one's fields are Status and CaptureDateTime
Just FYI, there are two other stored procedures that relate to this process and contain the other fields above that don't appear in the stored procedure that I've posted.
The amount of data in Data.RawCaptures is beginning to increase exponentially. Because of this, I am wondering if I need yet another index for this table (perhaps even of a different type?) or if what I already have should have me well-covered. If the latter is the case, I will start investigating other avenues to determine the cause of the timeouts.
Unfortunately SQL Server Profile is not supported in Azure SQL Database. I'll need to discover another approach.
The Query Execution Plan here:
https://www.brentozar.com/pastetheplan/?id=SJk4pHiw7
Related
EDITED: I have a table with composite key which is being used by multiple windows services deployed on multiple servers.
Columns:
UserId (int) [CompositeKey],
CheckinTimestamp (bigint) [CompositeKey],
Status (tinyint)
There will be continuous insertion in this table. I want my windows service to select top 10000 rows and do some processing while locking those 10000 rows only. I am using ROWLOCK for this using below stored procedure:
ALTER PROCEDURE LockMonitoringSession
AS
BEGIN
BEGIN TRANSACTION
SELECT TOP 10000 * INTO #TempMonitoringSession FROM dbo.MonitoringSession WITH (ROWLOCK) WHERE [Status] = 0 ORDER BY UserId
DECLARE #UserId INT
DECLARE #CheckinTimestamp BIGINT
DECLARE SessionCursor CURSOR FOR SELECT UserId, CheckinTimestamp FROM #TempMonitoringSession
OPEN SessionCursor
FETCH NEXT FROM SessionCursor INTO #UserId, #CheckinTimestamp
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE dbo.MonitoringSession SET [Status] = 1 WHERE UserId = #UserId AND CheckinTimestamp = #CheckinTimestamp
FETCH NEXT FROM SessionCursor INTO #UserId, #CheckinTimestamp
END
CLOSE SessionCursor
DEALLOCATE SessionCursor
SELECT * FROM #TempMonitoringSession
DROP TABLE #TempMonitoringSession
COMMIT TRANSACTION
END
But by doing so, dbo.MonitoringSession is being locked permanently until the stored procedure ends. I am not sure what I am doing wrong here.
The only purpose of this stored procedure is to select and update 10000 recent rows without any primary key and ensuring that whole table is not locked because multiple windows services are accessing this table.
Thanks in advance for any help.
(not an answer but too long for comment)
The purpose description should be about why/what for are you updating whole table. Your SP is for updating all rows with Status=0 to set Status=1. So when one of your services decides to run this SP - all rows become non-relevant. I mean, logically event which causes status change already occurred, you just need some time to physically change it in the database. So why do you want other services to read non-relevant rows? Ok, probably you need to read rows available to read (not changed) - but it's not clear again because you are updating whole table.
You may use READPAST hint to skip locked rows and you need rowlocks for that.
Ok, but even with processing of top N rows update of those N rows with one statement would be much faster then looping through this number of rows. You are doing same job but manually.
Check out example of combining UPDLOCK + READPAST to process same queue with parallel processes: https://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/
Small hint - CURSOR STATIC, READONLY, FORWARD_ONLY would do same thing as storing to temp table. Review STATIC option:
https://msdn.microsoft.com/en-us/library/ms180169.aspx
Another thing is a suggestion to think of RCSI. This will avoid other services locking for sure but this is a db-level option so you'll have to test all your functionality. Most of it will work same as before but some scenarios need testing (concurrent transactions won't be locked in situations where they were locked before).
Not clear to me:
what is the percentage of 10000 out of the total number of rows?
is there a clustered index or this is a heap?
what is actual execution plan for select and update?
what are concurrent transactions: inserts or selects?
by the way discovered similar question:
why the entire table is locked while "with (rowlock)" is used in an update statement
Ok - so the answer is probably 'everything is within a transaction in SQLServer'. But here's my problem.
I have a stored procedure which returns data based on one input parameter. This is being called from an external service which I believe is Java-based.
I have a log-table to see how long each hit to this proc is taking. Kind of like this (simplified)...
TABLE log (ID INT PRIMARY KEY IDENTITY(1,1), Param VARCHAR(10), In DATETIME, Out DATETIME)
PROC troublesome
(#param VARCHAR(10))
BEGIN
--log the 'in' time and keep the ID
DECLARE #logid INT
INSERT log (In,Param) VALUES (GET_DATE(),#param)
SET #logid = SCOPE_IDENTITY()
SELECT <some stuff from another table based on #param>
--set the 'out' time
UPDATE log SET Out = GET_DATE() WHERE ID = #logid
END
So far so easily-criticised by SQL fascists.
Anyway - When I call this eg troublesome 'SOMEPARAM' - I get the result of the SELECT back and in the log table is a nice new row with the SOMEPARAM and the in and out timestamps.
Now - I watch this table, and even though no rows are going in - if I am to generate a row, I will see that the ID has skipped many places. I guess this is being caused by the external client code hitting it. They are getting the result of the SELECT - but I am not getting their log data.
This suggests to me they are wrapping their client call in a TRAN and rolling it back. Which is one thing that would cause this behaviour. I want to know...
If there is a way I can FORCE the write of the log even if it is contained within a transaction over which I have no control and which is subsequently rolled back (seems unlikely)
If there is a way I can determine from within this proc if it is executing within a transaction (and perhaps raise an error)
If there are other possible explanations for the ID skipping (and it's skipping alot like 1000 places in an hour. So I'm sure it's caused by the client code - as they are reporting successfully retrieving the results of the SELECT also.)
Thanks!
For the why rows are skipped part of the question, your assumption sounds plausible to me.
For how to force the write even within a transaction, I don't know.
As for how to discover whether the SP is running inside a transaction how about the following:
SELECT ##trancount
I tested this out in an sql batch and seems to be ok. Maybe it can take you a bit further.
I have a stored procedure that does an insert of a row like this:
CREATE PROCEDURE dbo.sp_add_test
#CreatedBy NVARCHAR (128),
#TestId INT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO dbo.Test (
CreatedDate,
Title,
ParentTestId,
)
SELECT
#CreatedDate
Title,
#TestId
FROM Test
WHERE TestId = #TestId;
SELECT * from Test
WHERE TestId = #TestId
AND CreatedDate = #CreatedDate;
END
When inserted a new identity value will be generated for the primary key. As soon as the insert is completed I then do a select from that table.
Can someone tell me if there is another way I can do this? The reason I do a second select is that I need to get a value for the new TestId which is an identity column.
I am not familiar with the way SQL Server caches data. Does it cache recently used rows in the same way as Oracle does or will it go to the disk to get the row it just inserted?
In SQL Server, the right way to do this is with the OUTPUT clause. The documentation is here.
As far as I know, SQL Azure supports the OUTPUT clause.
As for your question, when a database commits as page (that is, when the insert is completed), the page often remains in memory. Typically, the "commit" is a log operation, so the data page remains in memory. An immediate access to the page should be fast, in the sense that it doesn't require a disk access. But the OUTPUT clause is the right approach in SQL Server.
I have a system which requires I have IDs on my data before it goes to the database. I was using GUIDs, but found them to be too big to justify the convenience.
I'm now experimenting with implementing a sequence generator which basically reserves a range of unique ID values for a given context. The code is as follows;
ALTER PROCEDURE [dbo].[Sequence.ReserveSequence]
#Name varchar(100),
#Count int,
#FirstValue bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
-- Ensure the parameters are valid
IF (#Name IS NULL OR #Count IS NULL OR #Count < 0)
RETURN -1;
-- Reserve the sequence
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Get the sequence ID, and the last reserved value of the sequence
DECLARE #SequenceID int;
DECLARE #LastValue bigint;
SELECT TOP 1 #SequenceID = [ID], #LastValue = [LastValue]
FROM [dbo].[Sequences]
WHERE [Name] = #Name;
-- Ensure the sequence exists
IF (#SequenceID IS NULL)
BEGIN
-- Create the new sequence
INSERT INTO [dbo].[Sequences] ([Name], [LastValue])
VALUES (#Name, #Count);
-- The first reserved value of a sequence is 1
SET #FirstValue = 1;
END
ELSE
BEGIN
-- Update the sequence
UPDATE [dbo].[Sequences]
SET [LastValue] = #LastValue + #Count
WHERE [ID] = #SequenceID;
-- The sequence start value will be the last previously reserved value + 1
SET #FirstValue = #LastValue + 1;
END
COMMIT TRANSACTION
END
The 'Sequences' table is just an ID, Name (unique), and the last allocated value of the sequence. Using this procedure I can request N values in a named sequence and use these as my identifiers.
This works great so far - it's extremely quick since I don't have to constantly ask for individual values, I can just use up a range of values and then request more.
The problem is that at extremely high frequency, calling the procedure concurrently can sometimes result in a deadlock. I have only found this to occur when stress testing, but I'm worried it'll crop up in production. Are there any notable flaws in this procedure, and can anyone recommend any way to improve on it? It would be nice to do with without transactions for example, but I do need this to be 'thread safe'.
MS themselves offer a solution and even they say it locks/deadlocks.
If you want to add some lock hints then you'd reduce concurrency for your high loads
Options:
You could develop against the "Denali" CTP which is the next release
Use IDENTITY and the OUTPUT clause like everyone else
Adopt/modify the solutions above
On DBA.SE there is "Emulate a TSQL sequence via a stored procedure": see dportas' answer which I think extends the MS solution.
I'd recommend sticking with the GUIDs, if as you say, this is mostly about composing data ready for a bulk insert (it's simpler than what I present below).
As an alternative, could you work with a restricted count? Say, 100 ID values at a time? In that case, you could have a table with an IDENTITY column, insert into that table, return the generated ID (say, 39), and then your code could assign all values between 3900 and 3999 (e.g. multiply up by your assumed granularity) without consulting the database server again.
Of course, this could be extended to allocating multiple IDs in a single call - provided that your okay with some IDs potentially going unused. E.g. you need 638 IDs - so you ask the database to assign you 7 new ID values (which imply that you've allocated 700 values), use the 638 you want, and the remaining 62 never get assigned.
Can you get some kind of deadlock trace? For example, enable trace flag 1222 as shown here. Duplicate the deadlock. Then look in the SQL Server log for the deadlock trace.
Also, you might inspect what locks are taken out in your code by inserting a call to exec sp_lock or select * from sys.dm_tran_locks immediately before the COMMIT TRANSACTION.
Most likely you are observing a conversion deadlock. To avoid them, you want to make sure that your table is clustered and has a PK, but this advice is specific to 2005 and 2008 R2, and they can change the implementation, rendering this advice useless. Google up "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes".
Anyway, if you observe an error during stress testing, it is likely that sooner or later it will occur in production as well.
You may want to use sp_getapplock to serialize your requests. Google up "Application Locks (or Mutexes) in SQL Server 2005". Also I described a few useful ideas here: "Developing Modifications that Survive Concurrency".
I thought I'd share my solution. I doesn't deadlock, nor does it produce duplicate values. An important difference between this and my original procedure is that it doesn't create the queue if it doesn't already exist;
ALTER PROCEDURE [dbo].[ReserveSequence]
(
#Name nvarchar(100),
#Count int,
#FirstValue bigint OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
IF (#Count <= 0)
BEGIN
SET #FirstValue = NULL;
RETURN -1;
END
DECLARE #Result TABLE ([LastValue] bigint)
-- Update the sequence last value, and get the previous one
UPDATE [Sequences]
SET [LastValue] = [LastValue] + #Count
OUTPUT INSERTED.LastValue INTO #Result
WHERE [Name] = #Name;
-- Select the first value
SELECT TOP 1 #FirstValue = [LastValue] + 1 FROM #Result;
END
I have a simple procedure that selects 1000 rows from a table. For the sake of clarity, here's what the stored procedure looks like:
alter procedure dbo.GetDomainForIndexing
#Amount int=1,
#LastID bigint,
#LastFetchDate datetime
as
begin
select top (#Amount) *
from DomainName with(readuncommitted)
where LastUpdated > #LastFetchDate
and ID > #LastID
and ContainsAdultWords is not null
order by ID
end
I've been having issues where it would run this particular procedure fine a bunch of times, the only difference being that a different #LastID value was being passed in each time. As soon as I get to a specific ID though, the procedure will return the first 880 rows almost instantly (this is happening in management studio) and then sit there and literally stall for the next 6 minutes before returning the remaining 120 rows.
What on earth could cause behaviour like this? There are no transactions associated with the connection and there are no connection pool issues. The (readuncommitted) bit does not affect the issue. The issue occurs both from within my application and when I copy the command text into SQL Management Studio for testing; indeed it is there that I discovered this weird stalling behaviour. Initially I was just trying to work out why this procedure would work fine a bunch of times and then suddenly start stalling for no apparent reason.
Any ideas?
UPDATE
The issue also occurs (stalling after 880 rows have been returned) when asking for 883 rows, but not when asking for 882 rows.
UPDATE 2
Selecting from sys.sysprocesses and sys.dm_exec_requests indicates a lastwaittype of PAGEIOLATCH_SH. What should I do?
Quite likely "parameter sniffin"g and an suboptimal cached plan for the offending #LastID value
Try this:
DECLARE #ILastID bigint
SET #ILastID = #LastID
select top (#Amount) *
from DomainName with(readuncommitted)
where LastUpdated > #LastFetchDate and ID > #ILastID and ContainsAdultWords is not null
order by ID
Another option:
What does sysprocesses say as LastWaittype? ASYNCH_NETWORK_IO?
If so, then the client can't deal with the SQL Server output
Have you tried rebuilding the indexes on DomainName?