Below is an outline of a SQL execution framework design using Service Broker that I have been playing with. I've outlined the process and have asked some questions through (highlight using a block quote) and would be interested in hearing any advice on the design.
Overview
I have a an ETL operation that needs to take data out of 5 databases and move it into 150 using select/insert statements or stored procedures. The result is about 2,000 individual queries, taking between 1 second to 1 hour each.
Each SQL query inserts data only. There is no need for data to be returned.
The operation can be broken up into 3 steps:
Pre-ETL
ETL
Post-ETL
The queries in each step can be executed in any order, but the steps have to stay in order.
Method
I am using Service Broker for asynchronous/parallel execution.
Any advice on how to tune service broker (e.g. any specific options to look at or guide for setting the number of queue workers?
Service Broker Design
Initiator
The initiator sends an XML message containing the SQL query to the Unprocessed queue, with an activation stored procedure called ProcessUnprocessedQueue. This process is wrapped in a try/catch in a transaction, rolling back the transaction when there is an exception.
ProcessUnpressedQueue
ProcessUnprocessedQueue passes the XML to procedure ExecSql
ExecSql - SQL Execution and Logging
ExecSql then handles the SQL execution and logging:
The XML is parsed, along with any other data about the execution that is going to be logged
Before the execution, a logging entry is inserted
If the transaction is started in the initiator, can I ensure the log entry insert is always committed if the outer transaction in the initiator is rolled back?
Something like SAVE TRANSACTION is not valid here, correct?
Should I not manipulate the transaction here, execute the query in a try/catch and, if it goes to the catch, insert a log entry for the exception and throw the exception since it is in the middle of the transaction?
The query is executed
Alternative Logging Solution?
I need to log:
The SQL query executed
Metadata about the operation
The time it takes for each process to finish
This is why I insert one row at the start and one at the end of the process
Any exceptions, if they exist
Would it be better to have an In-Memory OLTP table that contains the query information? So, I would have INSERT a row before the start of an operation and then do an UPDATE or INSERT to log exceptions and execution times. After the batch is done, I would then archive the data into a table stored to the disk to prevent the table from getting too big.
ProcessUnprocessedQueue - Manually process the results
After the execution, ProcessUnprocessedQueue gets back an updated version of the XML (to determine if the execution was successful, or other data about the transaction, for post-processing) and then sends that message to the ProcessedQueue, which does not have an activation procedure, so it can be manually processed (I need to know when a batch of queries has finished executing).
Processing the Queries
Since the ETL can be broken out into 3 steps, I create 3 XML variables where I will add all of the queries that are needed in the ETL operation, so I will have something like this:
#preEtlQueue xml
200 queries
#etlQueue xml
1500 queries
#postEtlQueue xml
300 queries
Why XML?
The XML queue variable is passed between different stored procedures as an OUTPUT parameter that updates it's values and/or add SQL queries to it. This variable needs to be written and read, so an alternative could be something like a global temp table or a persistent table.
I then process the XML variables:
Use a cursor to loop through the queries and send them to the service broker service.
Each group of queries contained in the XML variable is sent under the same conversation_group_id.
Values such as the to/from service, message type, etc. are all stored in the XML variable.
After the messages are sent to Service Broker, use a while loop to continuously check the ProcessedQueue until all the messages have been processed.
This implements a timeout to avoid an infinite loop
I'm thinking of redesigning this. Should I add an activation procedure on ProcessedQueue and then have that procedure insert the processed results into a physical table? If I do it this way, I wouldn't be able to use RECEIVE instead of a WHILE loop to check for processed items. Does that have any disadvantages?
I haven't built anything as massive as what you are doing now, but I will give you what worked for me, and some general opinions...
My preference is to avoid In-Memory OLTP and write everything to durable tables and keep the message queue as clean as possible
Use fastest possible hard drives in the server, write speed equivalent of NVMe or faster with RAID 10 etc.
I grab every message off the queue as soon as it hits and write it to a table I have named "mqMessagesReceived" (see code below, my all-purpose MQ handler named mqAsyncQueueMessageOnCreate)
I use a trigger in the "mqMessagesReceived" table that does a lookup to find which StoredProcedure to execute to process each unique message (see code below)
Each message has an identifier (in my case, I'm using the originating Tablename that wrote a message to the queue) and this identifier is used as a key for a lookup query run inside the the trigger of the mqMessagesReceived table, to figure out which subsequent Stored Procedure needs to be to run, to process each received message correctly.
Before sending a message on the MQ,
Can make a generic variable from the calling side (e.g. if a trigger is putting messages onto the MQ)
SELECT #tThisTableName = OBJECT_NAME(parent_object_id) FROM sys.objects
WHERE sys.objects.name = OBJECT_NAME(##PROCID)
AND SCHEMA_NAME(sys.objects.schema_id) = OBJECT_SCHEMA_NAME(##PROCID);
A configuration table is the lookup data for matching tablename with StoredProcedure that needs to be run, to process the MQ data that arrived and was written to the mqMessagesReceived table.
Here is the definition of that lookup table
CREATE TABLE [dbo].[mqMessagesConfig](
[ID] [int] IDENTITY(1,1) NOT NULL,
[tSourceTableReceived] [nvarchar](128) NOT NULL,
[tTriggeredStoredProcedure] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_mqMessagesConfig] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Here is the activation stored procedure that gets run as a message hits the queue
CREATE PROCEDURE [dbo].[mqAsyncQueueMessageOnCreate]
AS
BEGIN
SET NOCOUNT ON
DECLARE
#h UNIQUEIDENTIFIER,
#t sysname,
#b varbinary(200),
#hand VARCHAR(36),
#body VARCHAR(2000),
#sqlcleanup nvarchar(MAX)
-- Get all of the messages on the queue
-- the WHILE loop is infinite, until BREAK is received when we get a null handle
WHILE 1=1
BEGIN
SET #h = NULL
--Note the semicolon..!
;RECEIVE TOP(1)
#h = conversation_handle,
#t = message_type_name,
#b = message_body
FROM mqAsyncQueue
--No message found (handle is now null)
IF #h IS NULL
BEGIN
-- all messages are now processed, but we still have the #hand variable saved from processing the last message
SET #sqlcleanup = 'EXEC [mqConversationsClearOne] #handle = N' + char(39) + #hand + char(39) + ';';
EXECUTE(#sqlcleanup);
BREAK
END
--mqAsyncMessage message type received
ELSE IF #t = 'mqAsyncMessage'
BEGIN
SET #hand = CONVERT(varchar(36),#h);
SET #body = CONVERT(varchar(2000),#b);
INSERT mqMessagesReceived (tMessageType, tMessageBody, tMessageBinary, tConversationHandle)
VALUES (#t, #body, #b, #hand);
END
--unknown message type was received that we dont understand
ELSE
BEGIN
INSERT mqMessagesReceived (tMessageBody, tMessageBinary)
VALUES ('Unknown message type received', CONVERT(varbinary(MAX), 'Unknown message type received'))
END
END
END
CREATE PROCEDURE [dbo].[mqConversationsClearOne]
#handle varchar(36)
AS
-- Note: you can check the queue by running this query
-- SELECT * FROM sys.conversation_endpoints
-- SELECT * FROM sys.conversation_endpoints WHERE NOT([State] = 'CO')
-- CO = conversing [State]
DECLARE #getid CURSOR
,#sql NVARCHAR(MAX)
,#conv_id NVARCHAR(100)
,#conv_handle NVARCHAR(100)
-- want to create a chain of statements like this, one per conversation
-- END CONVERSATION 'FE851F37-218C-EA11-B698-4CCC6AD00AE9' WITH CLEANUP;
-- END CONVERSATION 'A4B4F603-208C-EA11-B698-4CCC6AD00AE9' WITH CLEANUP;
SET #getid = CURSOR FOR
SELECT [conversation_id], [conversation_handle]
FROM sys.conversation_endpoints
WHERE conversation_handle = #handle;
OPEN #getid
FETCH NEXT
FROM #getid INTO #conv_id, #conv_handle
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'END CONVERSATION ' + char(39) + #conv_handle + char(39) + ' WITH CLEANUP;'
EXEC sys.sp_executesql #stmt = #sql;
FETCH NEXT
FROM #getid INTO #conv_id, #conv_handle --, #conv_service
END
CLOSE #getid
DEALLOCATE #getid
and the table named "mqMessagesReceived" has this trigger
CREATE TRIGGER [dbo].[mqMessagesReceived_TriggerUpdate]
ON [dbo].[mqMessagesReceived]
AFTER INSERT
AS
BEGIN
DECLARE
#strMessageBody nvarchar(4000),
#strSourceTable nvarchar(128),
#strSourceKey nvarchar(128),
#strConfigStoredProcedure nvarchar(4000),
#sqlRunStoredProcedure nvarchar(4000),
#strErr nvarchar(4000)
SELECT #strMessageBody= ins.tMessageBody FROM INSERTED ins;
SELECT #strSourceTable = (select txt_Value from dbo.fn_ParseText2Table(#strMessageBody,'|') WHERE Position=2);
SELECT #strSourceKey = (select txt_Value from dbo.fn_ParseText2Table(#strMessageBody,'|') WHERE Position=3);
-- look in mqMessagesConfig to find the name of the final stored procedure
-- to run against the SourceTable
-- e.g. #strConfigStoredProcedure = mqProcess-tblLabDaySchedEventsMQ
SELECT #strConfigStoredProcedure =
(select tTriggeredStoredProcedure from dbo.mqMessagesConfig WHERE tSourceTableReceived = #strSourceTable);
SET #sqlRunStoredProcedure = 'EXEC [' + #strConfigStoredProcedure + '] #iKey = ' + #strSourceKey + ';';
EXECUTE(#sqlRunStoredProcedure);
INSERT INTO [mqMessagesProcessed]
(
[tMessageBody],
[tSourceTable],
[tSourceKey],
[tTriggerStoredProcedure]
)
VALUES
(
#strMessageBody,
#strSourceTable,
#strSourceKey,
#sqlRunStoredProcedure
);
END
Also, just some general SQL Server tuning advice that I found I also had to do (for dealing with a busy database)
By default there is just one single TempDB file per SQL Server, and TempDB has initial size of 8MB
However TempDB gets reset back to the initial 8MB size, every time the server reboots, and this company was rebooting the server every weekend via cron/taskscheduler.
The problem we saw was slow database and lots of record locks but only first thing Monday morning when everyone was hammering the database at once as they began their work-week.
When TempDB gets automatically re-sized, it is "locked" and therefore nobody at all can use that single TempDB (which is why the SQL Server was regularly becoming non-responsive)
By Friday the TempDB had grown to over 300MB.
So... to solve the following best practice recommendation, I created one TempDB file per vCPU, so I created 8 TempDB files, and I have distributed them across two available hard drives on that server, and most importantly, set their initial size to more than we need (200MB each is what I chose).
This fixed the problem with the SQL Server slowdown and record locking that was experienced every Monday morning.
Related
I am working on a mutation test framework for SQL Server, for this I need to be able to calculate what lines of a stored procedure, function or trigger are executed when I execute a certain stored procedure.
The difficult part is that I want to know the exact lines or statements being executed from the stored procedure I call.
With a query like this I can see what stored procedures/triggers/functions are being executed, since I know when I call the stored procedure I can use the time to see if it was executed.
SELECT d.object_id, d.database_id,
OBJECT_NAME(object_id, database_id) AS proc_name,
MAX( d.last_execution_time) as last_execution_time,
OBJECT_DEFINITION(object_id) as definition
FROM sys.dm_exec_procedure_stats AS d
WHERE d.database_id = DB_ID()
GROUP BY d.object_id, d.database_id,
OBJECT_NAME(object_id, database_id)
How would I find the lines/statements that have been executed, I also have to know inside what stored procedure/trigger/function the lines/statements exists and in which shema this is. I have to take into account that a IF/ELSE statement may be used.
With this data I can do 2 important things:
generate a code coverage report
optimize what lines to mutate, since I dont have to mutate uncovered lines.
A possible, but not a very nice, solution would be to automaticly change stored procedures to add a line that inserts the previous line into a table, but this will require splitting up the procedure into statements, which I don't know how to do.
Please note that I cannot change the code users want to test with my framework. I can search for patterns and replace but manually changing procedures is NOT a option.
EDIT:
Lets redifine this question: How to split a stored procedure definition into its different statements in a way that does not depend on code style?
and How to add a new statement in between found statements?
EDIT: in the SO post SQL Server: How to parse code into its different statements I have found a way to trace statement execution, but I can't filter it yet.
So the extended events are the solution, this is how I have done it:
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='testMSSQLTrace')
DROP EVENT SESSION testMSSQLTrace ON SERVER;
DECLARE #cmd VARCHAR(MAX) = '';
SELECT #cmd = 'CREATE EVENT SESSION testMSSQLTrace
ON SERVER
ADD EVENT sqlserver.sp_statement_completed
(WHERE (sqlserver.database_name = N''' + DB_NAME() + '''))
ADD TARGET package0.ring_buffer
WITH (
MAX_MEMORY = 2048 KB,
EVENT_RETENTION_MODE = NO_EVENT_LOSS,
MAX_DISPATCH_LATENCY = 3 SECONDS,
MAX_EVENT_SIZE = 0 KB,
MEMORY_PARTITION_MODE = NONE,
TRACK_CAUSALITY = OFF,
STARTUP_STATE = OFF
);'
EXEC (#cmd)
This creates an event that can be fired after every statement completion, this is done dynamicly to filter on the database
Then I have 3 procedures that make controlling this event easy
/*******************************************************************************************
Starts the statement trace
*******************************************************************************************/
CREATE OR ALTER PROC testMSSQL.Private_StartTrace
AS
BEGIN
ALTER EVENT SESSION testMSSQLTrace
ON SERVER
STATE = START;
END
GO
/*******************************************************************************************
Ends the statement trace, this also clears the trace
*******************************************************************************************/
CREATE OR ALTER PROC testMSSQL.Private_StopTrace
AS
BEGIN
ALTER EVENT SESSION testMSSQLTrace
ON SERVER
STATE = STOP;
END
GO
/*******************************************************************************************
Saves the statements trace
*******************************************************************************************/
CREATE OR ALTER PROC testMSSQL.Private_SaveTrace
AS
BEGIN
DECLARE #xml XML;
SELECT #xml = CAST(xet.target_data AS xml)
FROM sys.dm_xe_session_targets AS xet INNER JOIN sys.dm_xe_sessions AS xe ON (xe.address = xet.event_session_address)
WHERE xe.name = 'testMSSQLTrace'
INSERT INTO testMSSQL.StatementInvocations (testProcedure, procedureName, lineNumber, statement)
SELECT testMSSQL.GetCurrentTest(),
OBJECT_NAME(T.c.value('(data[#name="object_id"]/value)[1]', 'int')),
T.c.value('(data[#name="line_number"]/value)[1]', 'int'),
T.c.value('(data[#name="statement"]/value)[1]', 'VARCHAR(900)')
FROM #xml.nodes('RingBufferTarget/event') T(c)
WHERE T.c.value('(data[#name="nest_level"]/value)[1]', 'int') > 3
END
GO
These procedures respectivly start and stop the trace and the last one stores the result in a table where it filters on the nest level so my own code is not traced.
Finally I use it a bit like this:
start trace
start tran/savepoint
run SetUp (users code)
run test (users code)
save trace
save trace to variable
rollback tran (also catch errors and stuff like that)
save variable back to table so the trace is not rolled back
Special thanks to #Jeroen Mosterd for originally coming up with a proposal for this solution in this SQL Server: How to parse code into its different statements SO post
You can either:
Add a #DEBUG parameter to each stored procedure you call, or
Log everything you want, or
Only log when you want.
With the #Debug parameter, you can default it to OFF, then call it with ON when you want to trace your statements, with the following code:
IF (#Debug = 1) PRINT 'your tracing information goes here';
If you want to log everything, create a log table and insert a row into it wherever you need to know which statement was executed, such as:
DECLARE #log AS TABLE (msg VARCHAR(MAX));
and
INSERT INTO #log VALUES('your tracing information goes here');
Or you can combine them:
IF (#Debug = 1) INSERT INTO #log VALUES('your tracing information goes here');
Of course these will affect performance even when you don't output/log.
My logical schema is as follows:
A header record can have multiple child records.
Multiple PCs can be inserting Child records, via a stored procedure that accepts details about the child record, and a value.
When a child record is inserted, a header record may need to be inserted if one doesn't exist with the specified value.
You only ever want one header record inserted for any given "value". So if two child records are inserted with the same "Value" supplied, the header should only be created once. This requires concurrency management during inserts.
Multiple PCs can be querying unprocessed header records, via a stored procedure
A header record needs to be queried if it has a specific set of child records, and the header record is unprocessed.
You only ever want one machine PC to query and process each header record. There should never be an instance where a header record and it's children should be processed by more than one PC. This requires concurrency management during selects.
So basically my header query looks like this:
BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT TOP 1
*
INTO
#unprocessed
FROM
Header h WITH (READPAST, UPDLOCK)
JOIN
Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN
Child part2 ON part1.HeaderID = part2.HeaderID AND
WHERE
h.Processed = 0x0;
UPDATE
Header
SET
Processed = 0x1
WHERE
HeaderID IN (SELECT [HeaderID] FROM #unprocessed);
SELECT * FROM #unprocessed
COMMIT TRAN
So the above query ensures that concurrent queries never return the same record.
I think my problem is on the insert query. Here's what I have:
DECLARE #HeaderID INT
BEGIN TRAN
--Create header record if it doesn't exist, otherwise get it's HeaderID
MERGE INTO
Header WITH (HOLDLOCK) as target
USING
(
SELECT
[Value] = #Value, --stored procedure parameter
[HeaderID]
) as source ([Value], [HeaderID]) ON target.[Value] = source.[Value] AND
target.[Processed] = 0
WHEN MATCHED THEN
UPDATE SET
--Get the ID of the existing header
#HeaderID = target.[HeaderID],
[LastInsert] = sysdatetimeoffset()
WHEN NOT MATCHED THEN
INSERT
(
[Value]
)
VALUES
(
source.[Value]
)
--Get new or existing ID
SELECT #HeaderID = COALESCE(#HeaderID , SCOPE_IDENTITY());
--Insert child with the new or existing HeaderID
INSERT INTO
[Correlation].[CorrelationSetPart]
(
[HeaderID],
[Name]
)
VALUES
(
#HeaderID,
#Name --stored procedure parameter
);
My problem is that insertion query is often blocked by the above selection query, and I'm receiving timeouts. The selection query is called by a broker, so it can be called fairly quickly. Is there a better way to do this? Note, I have control over the database schema.
To answer the second part of the question
You only ever want one machine PC to query and process each header
record. There should never be an instance where a header record and
it's children should be processed by more than one PC
Have a look at sp_getapplock.
I use app locks within the similar scenario. I have a table of objects that must be processed, similar to your table of headers. The client application runs several threads simultaneously. Each thread executes a stored procedure that returns the next object for processing from the table of objects. So, the main task of the stored procedure is not to do the processing itself, but to return the first object in the queue that needs processing.
The code may look something like this:
CREATE PROCEDURE [dbo].[GetNextHeaderToProcess]
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarHeaderID int = NULL;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'GetNextHeaderToProcess_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- Find the most suitable header for processing
SELECT TOP 1
#VarHeaderID = h.HeaderID
FROM
Header h
JOIN Child part1 ON part1.HeaderID = h.HeaderID AND part1.Name = 'XYZ'
JOIN Child part2 ON part1.HeaderID = part2.HeaderID
WHERE
h.Processed = 0x0
ORDER BY ....;
-- sorting is optional, but often useful
-- for example, order by some timestamp to process oldest/newest headers first
-- Mark the found Header to prevent multiple processing.
UPDATE Header
SET Processed = 2 -- in progress. Another procedure that performs the actual processing should set it to 1 when processing is complete.
WHERE HeaderID = #VarHeaderID;
-- There is no need to explicitly verify if we found anything.
-- If #VarHeaderID is null, no rows will be updated
END;
-- Return found Header, or no rows if nothing was found, or failed to acquire the lock
SELECT
#VarHeaderID AS HeaderID
WHERE
#VarHeaderID IS NOT NULL
;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH;
END
This procedure should be called from the procedure that does actual processing. In my case the client application does the actual processing, in your case it may be another stored procedure. The idea is that we acquire the app lock for the short time here. Of course, if the actual processing is fast you can put it inside the lock, so only one header can be processed at a time.
Once the lock is acquired we look for the most suitable header to process and then set its Processed flag. Depending on the nature of your processing you can set the flag to 1 (processed) right away, or set it to some intermediary value, like 2 (in progress) and then set it to 1 (processed) later. In any case, once the flag is not zero the header will not be chosen for processing again.
These app locks are separate from normal locks that DB puts when reading and updating rows and they should not interfere with inserts. In any case, it should be better than locking the whole table as you do WITH (UPDLOCK).
Returning to the first part of the question
You only ever want one header record inserted for any given "value".
So if two child records are inserted with the same "Value" supplied,
the header should only be created once.
You can use the same approach: acquire app lock in the beginning of the inserting procedure (with some different name than the app lock used in querying procedure). Thus you would guarantee that inserts happen sequentially, not simultaneously. BTW, in practice most likely inserts can't happen simultaneously anyway. The DB would perform them sequentially internally. They will wait for each other, because each insert locks a table for update. Also, each insert is written to transaction log and all writes to transaction log are also sequential. So, just add sp_getapplock to the beginning of your inserting procedure and remove that WITH (HOLDLOCK) hint in the MERGE.
The caller of the GetNextHeaderToProcess procedure should handle correctly the situation when procedure returns no rows. This can happen if the lock acquisition timed out, or there are simply no more headers to process. Usually the processing part simply retries after a while.
Inserting procedure should check if the lock acquisition failed and retry the insert or report the problem to the caller somehow. I usually return the generated identity ID of the inserted row (the ChildID in your case) to the caller. If procedure returns 0 it means that insert failed. The caller may decide what to do.
I have created a "queue" of sorts in sql, and I want to be able to set an item as invisible to semi-simulate an azure like queue (instead of deleting it immediately in the event the worker fails to process it, it will appear automatically in the queue again for another worker to fetch).
As per recommendation from this SO: Is T-SQL Stored Procedure Execution 'atomic'?
I wrapped Begin Tran and Commit around my spDeQueue procedure, but I'm still running into duplicate pulls from my test agents. (They are all trying to empty a queue of 10 simultaneously and I'm getting duplicate reads, which I shouldn't)
This is my sproc
ALTER PROCEDURE [dbo].[spDeQueue]
#invisibleDuration int = 30,
#priority int = null
AS
BEGIN
begin tran
declare #now datetime = GETDATE()
-- get the queue item
declare #id int =
(
select top 1
[Id]
from
[Queue]
where
([InvisibleUntil] is NULL or [InvisibleUntil] <= #now)
and (#priority is NULL or [Priority] = #priority)
order by
[Priority],
[Created]
)
-- set the invisible and viewed count
update
[Queue]
set
[InvisibleUntil] = DATEADD(second, #invisibleDuration, #now),
[Viewed] = [Viewed] + 1
where
[Id] = #id
-- fetch the entire item
select
*
from
[Queue]
where
[Id] = #id
commit
return 0
END
What should I do to ensure this acts atomicly, to prevent duplicate dequeues.
Thanks
Your transaction (ie statements between 'begin trans' and 'commit') is atomic in the sense that either all the statements will be committed to the database, or none of them.
It appears you have transactions mixed up with synchronization / mutual exclusive execution.
Have a read into transaction isolation levels which should help enforce sequential execution - repeatable read might do the trick. http://en.wikipedia.org/wiki/Isolation_(database_systems)
I use SQL Service Broker with internal activation to move a list of jobs to the internal activated stored procedure to complete without keeping the main thread/requestor waiting for the actualy individual jobs to finish. Essentially i'm trying to free up the UI thread. The problem is, I passed 2000+ jobs to the Service broker and the messages reached the queue in about 25 mins and free'd the UI however even after an hour, it has only finished working on close to 600+ jobs
I use below query to count the number waiting to be completed and it looks like its extremely slow
SELECT COUNT(*)
FROM [HMS_Test].[dbo].[HMSTargetQueueIntAct]
WITH(NOLOCK)
Below is my activation stored procedure for your ref. Can someone please have a look and let me know whats wrong with this? How can I get the SB to finish these items on the queue quickly? Thanks in advance :)
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_SB_HMSTargetActivProc]
AS
BEGIN
DECLARE #RecvReqDlgHandle UNIQUEIDENTIFIER;
DECLARE #RecvReqMsg NVARCHAR(1000);
DECLARE #RecvReqMsgName sysname;
DECLARE #XMLPtr int
DECLARE #ExecuteSQL nvarchar(1000)
DECLARE #CallBackSP nvarchar(100)
DECLARE #CallBackSQL nvarchar(1000)
DECLARE #SBCaller nvarchar(50)
DECLARE #LogMsg nvarchar(1000)
WHILE (1=1)
BEGIN
BEGIN TRANSACTION;
WAITFOR
( RECEIVE TOP(1)
#RecvReqDlgHandle = conversation_handle,
#RecvReqMsg = message_body,
#RecvReqMsgName = message_type_name
FROM HMSTargetQueueIntAct
), TIMEOUT 5000;
IF (##ROWCOUNT = 0)
BEGIN
ROLLBACK TRANSACTION;
BREAK;
END
IF #RecvReqMsgName = N'//HMS/InternalAct/RequestMessage'
BEGIN
DECLARE #ReplyMsg NVARCHAR(100);
SELECT #ReplyMsg = N'<ReplyMsg>ACK Message for Initiator service.</ReplyMsg>';
SEND ON CONVERSATION #RecvReqDlgHandle
MESSAGE TYPE
[//HMS/InternalAct/ReplyMessage]
(#ReplyMsg);
EXECUTE sp_xml_preparedocument #XMLPtr OUTPUT, #RecvReqMsg
SELECT #ExecuteSQL = ExecuteSQL
,#CallBackSP = CallBackSP
,#SBCaller = SBCaller
FROM OPENXML(#XMLPtr, 'RequestMsg/CommandParameters', 1)
WITH (ExecuteSQL nvarchar(1000) 'ExecuteSQL'
,CallBackSP nvarchar(1000) 'CallBackSP'
,SBCaller nvarchar(50) 'SBCaller'
)
EXEC sp_xml_removedocument #XMLPtr
IF ((#ExecuteSQL IS NOT NULL) AND (LEN(#ExecuteSQL)>0))
BEGIN
SET #LogMsg='ExecuteSQL:' + #ExecuteSQL
EXECUTE(#ExecuteSQL);
SET #LogMsg='ExecuteSQLSuccess:' + #ExecuteSQL
EXECute sp_LogSystemTransaction #SBCaller,#LogMsg,'SBMessage',0,''
END
IF ((#CallBackSP IS NOT NULL) AND (LEN(#CallBackSP)>0))
BEGIN
SET #CallBackSQL = #CallBackSP + ' #Sender=''sp_SB_HMSTargetActivProc'', #Res=''' + #ExecuteSQL + ''''
SET #LogMsg='CallBackSQL:' + #CallBackSQL
EXECute sp_LogSystemTransaction #SBCaller,#LogMsg,'SBMessage',0,''
EXECUTE(#CallBackSQL);
END
END
ELSE IF #RecvReqMsgName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
BEGIN
SET #LogMsg='MessageEnd:';
END CONVERSATION #RecvReqDlgHandle WITH CLEANUP;
END
ELSE IF #RecvReqMsgName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
BEGIN
DECLARE #message_body VARBINARY(MAX);
DECLARE #code int;
DECLARE #description NVARCHAR(3000);
DECLARE #xmlMessage XML;
SET #xmlMessage = CAST(#RecvReqMsg AS XML);
SET #code = (
SELECT #xmlMessage.value(
N'declare namespace
brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error";
(/brokerns:Error/brokerns:Code)[1]',
'int')
);
SET #description = (
SELECT #xmlMessage.value(
'declare namespace
brokerns="http://schemas.microsoft.com/SQL/ServiceBroker/Error";
(/brokerns:Error/brokerns:Description)[1]',
'nvarchar(3000)')
);
IF (#code = -8462)
BEGIN
SET #LogMsg='MessageEnd:';
--EXECute sp_LogSystemTransaction #SBCaller,#LogMsg,'SBMessage',0,'';
END CONVERSATION #RecvReqDlgHandle WITH CLEANUP;
END
ELSE
BEGIN
SET #LogMsg='ERR:' + #description + ' ' + CAST(#code AS VARCHAR(20));
EXECute sp_LogSystemTransaction #SBCaller,#LogMsg,'SBError',0,'';
END CONVERSATION #RecvReqDlgHandle;
END
END
COMMIT TRANSACTION;
END
END
One thing I noticed is that this thing doesn't seem to do much. Most of the lines of code seem to be crafting a reply message for the service broker dialog.
That said, the existence of service broker means that you don't have to use sp_xml_preparedocument for your xml needs. Take a look at XQuery. In short, something like this should work:
SELECT #ExecuteSQL = #RcvReqMsg.value('(RequestMsg/CommandParameters/ExecuteSQL)[1]', 'nvarchar(1000)')
,#CallBackSP = #RcvReqMsg.value('(RequestMsg/CommandParameters/CallBackSP)[1]', 'nvarchar(1000)')
,#SBCaller = #RcvReqMsg.value('(RequestMsg/CommandParameters/SBCaller)[1]', 'nvarchar(1000)')
Secondly, it looks like the messages contain SQL to be executed in the context of the database that contains this queue. What is the performance profile of those? That is, are those your bottleneck? If those are slow, adding service broker to the mix won't magically make things go fast
Thirdly, are you allowing for more than one activation procedure to be active at a time? Check the max_readers column in sys.service_queues to answer this. If it's set to 1 and your process is such that they needn't be run serially, increase that number to run them in parallel.
Fourthly, it looks like you've written your activation procedure to process only one message before completing. Check out the example in this tutorial. Notice the while (1=1) loop. That makes the activation procedure go back to the queue for another message once it's finished with the current message
Lastly, why do you care? Service broker is an inherently asynchronous technology. If something/someone is waiting for a given message to be processed, I'd question that.
First and foremost I would recommend to threat this as a performance issue and approach it as any other performance issue: measure. See How to analyse SQL Server performance for a brief introduction and explicit advice on how to measure waits, IO, CPU overall, for a session or for a statement, and how to identify bottlenecks. Once you know where the bottleneck is then you can consider means to address it.
Now for something more specific to SSB. I would say that your procedure has three components that are interesting for the question:
the queue processing, ie. the RECEIVE, END CONVERSATION
the message parsing (XML shredding)
the execution (EXECUTE(#ExecuteSQL))
For queue processing I recommend Writing Service Broker Procedures for how to speed up things. RECEIVE TOP(1) is the slowest possible approach. Processing in batch is faster, even much faster, if possible. To dequeue a batch you need correlated messages in the queue, which means SEND-ing many messages on a single conversation handle, see Reusing Conversation. This may complicate the application significantly. Therefore I would strongly urge you to measure and determine the bottleneck before doing such drastic changes.
For the XML shredding I concur with #BenThul, using XML data type methods is better than using MSXML procedures.
And finally there is the EXECUTE(#ExecuteSQL). This for us is a black box, only you know what is actually being executed. Not only how expensive/complex the SQL executed is, but also how likely is to block. Lock contention between this background execution and your front-end code could slow down the queue processing a great deal. Again, measure and you will know. As a side note: from the numbers you posted, I would expect the problem to be here. In my experience an activated procedure that does exactly what you do (RECEIVE TOP(1), XML parsing, SEND a response), w/o the EXECUTE, should go at a rate of about 100 messages per second and drain your queue of 2000 jobs in about 20 seconds. You observe a much slower rate, which would had me suspect the actually executed SQL.
Finally, the easy thing to try: bump up MAX_QUEUE_READERS (again, as #BenThul already pointed out):
ALTER QUEUE HMSTargetQueueIntAct WITH ACTIVATION (MAX_QUEUE_READERS = 5)
This will allow parallel processing of requests.
You are missing proper error handling in your procedure, you should have a BEGIN TRY/BEGIN CATCH block. See Error Handling in Service Broker procedures, Error Handling and Activation, Handling exceptions that occur during the RECEIVE statement in activated procedures and Exception handling and nested transactions.
I am having a problem and I want to ask some opinions on the best approach to accomplish this task.
The process that I am implementing is a purge solution for big databases. The tables can have a variable amount of rows that can go from a few thousands to big millions. What the process will do is to delete just the data for a specific year. Partitioning is not an option here.
I have a some initial key tables that were chosen to start the process, and from these tables I will recursively delete data, using some stored procedures built for this purpose (in order to maintain data integrity and delete only the necessary data).
I am using SSIS to be able to start the data deletion in multiple tables at the same time but I am having problems with locks.
The deletion process as the following steps:
Drop all indexes on table
Delete data in batches
Rebuild the primary key indexes (this needs to be done for performance, because there are table with some primary keys non clustered)
I have tried:
Using transactions and named transactions (still have locks)
Using sp_getapplock (still have locks)
In SSIS I have serializable as the default isolation for the transactions.
My main question is, because I have to do some DML and DDL in the same transaction, and this will take some time to execute, I want all tasks that need to obtain a certain lock to be waiting until the lock is released and not be targeted as deadlock victim (very the same as a mutex).
The database will have no activity during this operation, only this process will be executed.
This is the code I use to delete the data, and here is where the deadlocks are coming from.
select #query = N'
BEGIN TRANSACTION;
DECLARE #result int;
EXEC #result = sp_getapplock #Resource = ''[dbo].'+#dep_tbl_nm+''', #LockMode = ''Exclusive'', #LockOwner = ''Session'', #LockTimeout = -1;
EXEC CreateIndexes ''dbo.'+#dep_tbl_nm+'''
EXEC CreateCoveringIndexes '''+#tbl_nm+''','''+#dep_tbl_nm+''','''+#dep_tmp_tbl_nm+'''
WHILE(1=1)
BEGIN
DELETE TOP(50000) FROM a OUTPUT deleted.* into '+#dep_tmp_tbl_nm+' FROM dbo.'+ #dep_tbl_nm + ' AS A INNER JOIN '+#tmp_tbl_nm+' AS B ON ' + #on_list +'
if(##ROWCOUNT = 0)
break;
END
exec (''ALTER INDEX ALL ON '+#dep_tbl_nm+' REBUILD WITH (FILLFACTOR = 80)'')
COMMIT TRANSACTION;'
print(#query)
exec(#query)
Comments are welcome
Regards