Capture TSQL when Missing_Join_Predicate event occurs - sql-server

I am configuring Event Notifications on a Service Broker Queue to log when various performance related events occur. One of these is Missing_Join_Predicate.
The XML payload of this event holds nothing useful for me to identify the cause (TSQL, query plan, objectid(s) etc) so in the procedure to process the queue I am trying to use the TransactionID to query dm_exec_requests and dm_exec_query_plan to get the query plan and the TSQL where the dm_exec_requests.transactionid is the TransactionID from the event.
The code catches no data.
Removing the filter from the query (ie collecting all rows from dm_exec_requests and dm_exec_query_plan) shows there are records returned but none for the TransactionID in question.
Is what I am trying to do possible? Where am I going wrong?!

The trace based event notifications, like MISSING_JOIN_PREDICATE, are just a verbatim translation of the corresponding trace event (Missing Join Predicate Event Class) and carry exactly the same info. For this particular event there's basically no useful info whatsoever, the <TransactionID> is the xact id that triggered the event and, by the time you dequeue it and process the notification message, the transaction is most likely finished and gone (yay for asynchronous queued based processing).
When using the original trace error event, eg. with Profiler, you could also enable SQL:BatchCompleted, filter appropriately and then catch the JOIN culprit in the act. But with Event Notifications I don't see any feasible way to automate the process to the point at which you can pinpoint the problem query and application. With EN you can, at best, raise the awareness of the problem, show the client that cause it (the app), and then use other means (eg. code inspection) to actually hunt down the problem root cause.
Unfortunately you'll discover that there are more event notification events that suffer from similar problem(s).

Related

What does SQL Server sys.dm_broker_activated_tasks tell me?

I have a server broker application I inherited that abends with a false negative.
I think it is using sys.dm_broker_activated_tasks incorrectly, and I want to validate that my understanding of what that view shows is correct.
Can I assume that this view is showing tasks being activated, and no so much those that were activated, but are now in the process of completing?
The procedure I have monitors for completion of processing by looking for when there are no entries in sys.dm_broker_activated_tasks for that queue.
This appears to work (mostly), except occasionally at the end when processing in the queue is winding down.
The row in that table seems to disappears before the final message in the queue has completed.
And unfortunately, as this uses the fire and forget anti-pattern, I can't really at this time do more than make the polling monitor a bit smarter.
That view doesn't do much apart from:
Returns a row for each stored procedure activated by Service Broker.
https://msdn.microsoft.com/en-us/library/ms175029.aspx
Not sure if you have looked at the code, but I think a better usage of it is to combine it with sys.dm_exec_sessions
select
at.spid
,DB_NAME(at.database_id) AS [DatabaseName]
,at.queue_id
,at.[procedure_name]
,s.[status]
,s.login_time
from
sys.dm_broker_activated_tasks at
inner join
sys.dm_exec_sessions s
on
at.spid = s.session_id;
Another good place to troubleshoot Service Broker is sys.transmission_queue. You will see every message sent there until there is an acknowledgement that it was received.

Sql Server Service Broker - thorough, in-use example of externally activated console app

I need some guidance from anyone who has deployed a real-world, in-production application that uses the Sql Server Service Broker external activation mechanism (via the Service Broker External Activator from the Feature Pack).
Current mindset:
My specs are rather simple (or at least I think so), so I'm thinking of the following basic flow:
order-like entity gets inserted into a Table_Orders with state "confirmed"
SP_BeginOrder gets executed and does the following:
begins a TRANSACTION
starts a DIALOG from Service_HandleOrderState to Service_PreprocessOrder
stores the conversation handle (from now on PreprocessingHandle) in a specific column of the Orders table
sends a MESSAGE of type Message_PreprocessOrder containing the order id using PreprocessingHandle
ends the TRANSACTION
Note that I'm not ending the conversation, I don't want "fire-and-forget"
event notification on Queue_PreprocessOrder activates an instance of PreprocessOrder.exe (max concurrent of 1) which does the following:
begins a SqlTransaction
receives top 1 MESSAGE from Queue_PreprocessOrder
if message type is Message_PreprocessOrder (format XML):
sets the order state to "preprocessing" in Table_Orders using the order id in the message body
loads n collections of data of which computes an n-ary Carthesian product (via Linq, AFAIK this is not possible in T-SQL) to determine the order items collection
inserts the order items rows into a Table_OrderItems
sends a MESSAGE of type Message_PreprocessingDone, containing the same order id, using PreprocessingHandle
ends the conversation pertaining to PreprocessingHandle
commits the SqlTransaction
exits with Environment.Exit(0)
internal activation on Queue_HandleOrderState executes a SP (max concurrent of 1) that:
begins a TRANSACTION
receives top 1 MESSAGE from Queue_InitiatePreprocessOrder
if message type is Message_PreprocessingDone:
sets the order state to "processing" in Table_Orders using the order id in the message body
starts a DIALOG from Service_HandleOrderState to Service_ProcessOrderItem
stores the conversation handle (from now on ProcessOrderItemsHandle) in a specific column of Table_Orders
creates a cursor for rows in Table_OrderItems for current order id and for each row:
sends a MESSAGE of type Message_ProcessOrderItem, containing the order item id, using ProcessOrderItemsHandle
if message type is Message_ProcessingDone:
sets the order state to "processed" in Table_Orders using the order id in the message body
if message type is http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog (END DIALOG):
ends the conversation pertaining to conversation handle of the message
ends the TRANSACTION
event notification on Queue_ProcessOrderItem activates an instance of ProcessOrderItem.exe (max concurrent of 1) which does the following:
begins a SqlTransaction
receives top 1 MESSAGE from Queue_ProcessOrderItem
if message type is Message_ProcessOrderItem (format XML):
sets the order item state to "processing" in Table_OrdersItems using the order item id in the message body, then:
loads a collection of order item parameters
makes a HttpRequest to a URL using the parameters
stores the HttpResponse as a PDF on filesystem
if any errors occurred in above substeps, sets the order item state to "error", otherwise "ok"
performs a lookup in the Table_OrdersItems to determine if all order items are processed (state is "ok" or "error")
if all order items are processed:
sends a MESSAGE of type Message_ProcessingDone, containing the order id, using ProcessOrderItemsHandle
ends the conversation pertaining to ProcessOrderItemsHandle
commits the SqlTransaction
exits with Environment.Exit(0)
Notes:
specs specify MSSQL compatibility 2005 through 2012, so:
no CONVERSATION GROUPS
no CONVERSATION PRIORITY
no POISON_MESSAGE_HANDLING ( STATUS = OFF )
I am striving to achieve overall flow integrity and continuity, not speed
given that tables and SPs reside in DB1 whilst Service Broker objects (messages, contracts, queues, services) reside in DB2, DB2 is SET TRUSTWORTHY
Questions:
Are there any major design flaws in the described architecture ?
Order completion state tracking doesn't seem right. Is there a better method ? Maybe using QUEUE RETENTION ?
My intuition tells me that in no case whatsoever should the activated external exe terminate with an exit code other than 0, so there should be try{..}catch(Exception e){..} finally{ Environment.Exit(0) } in Main. Is this assumption correct ?
How would you organize error handling in DB code ? Is an error log table enough?
How would you organize error handling in external exe C# code ? Same error logging
table ?
I've seen the SQL Server Service Broker Product Samples, but the Service Broker Interface seems overkill for my seemingly simpler case. Any alternatives for a simpler Service Broker object model ?
Any cross-version "portable" admin tool for Service Broker capable of at least draining poison messages ?
Have you any decent code samples for any of the above ?
Q: Are there any major design flaws in the described architecture ?
A: Couple of minor perks:
- waiting for an HTTP request to complete while holding open a transaction is bad. You can't achieve transactional consistency between a database and HTTP anyway, so don't risk to have a transaction stretch for minutes when the HTTP is slow. The typical pattern is to {begin tran/receive/begin conversation timer/commit} then issue the HTTP call w/o any DB xact. If the HTTP call succeeds then {begin xact/send response/end conversation/commit}. If the HTTP fails (or client crashes) then let the conversation time activate you again. You'll get a timer message (no body), you need to pick up the item id associated with the handle from your table(s).
Q: Order completion state tracking doesn't seem right. Is there a better method ? Maybe using QUEUE RETENTION ?
A: My one critique of your state tracking is the dependency on scanning the order items to determine that the current processed one is the last one (5.3.4). For example you could add the information that this is the 'last' item to be processed in the item state so you know, when processing it, that you need to report the completion. RETENTION is only useful in debugging or when you have logic that require to run 'logical rollback' and to compensating actions on conversation error.
Q: My intuition tells me that in no case whatsoever should the activated external exe terminate with an exit code other than 0, so there should be try{..}catch(Exception e){..} finally{ Environment.Exit(0) } in Main. Is this assumption correct ?
A: The most important thing is for the activated process to issue a RECEIVE statement on the queue. If it fails to do so the queue monitor may enter the notified state forever. Exit code is, if I remember correctly, irrelevant. As with any background process is important to catch and log exceptions, otherwise you'll never even know it has a problem when it start failing. In addition to disciplined try/catch blocks, Hookup Application.ThreadException for UI apps and AppDomain.UnhandledException for both UI and non-UI apps.
Q: How would you organize error handling in DB code ? Is an error log table enough?
A: I will follow up later on this. Error log table is sufficient imho.
Q: How would you organize error handling in external exe C# code ? Same error logging table ?
A: I created bugcollect.com exactly because I had to handle such problems with my own apps. The problem is more than logging, you also want some aggregation and analysis (at least detect duplicate reports) and suppress floods of errors from some deployment config mishap 'on the field'. Truth be told nowadays there are more options, eg. exceptron.com. And of course I think FogBugs also has logging capabilities.
Q: I've seen the SQL Server Service Broker Product Samples, but the Service Broker Interface seems overkill for my seemingly simpler case. Any alternatives for a simpler Service Broker object model ?
finally, an easy question: Yes, it is overkill. There is no simple model.
Q: Any cross-version "portable" admin tool for Service Broker capable of at least draining poison messages ?
A: The problem with poison messages is that the definition of poison message changes with your code: the poison message is whatever message breaks the current guards set in place to detect it.
Q: Have you any decent code samples for any of the above ?
A: No
One more point: try to avoid any reference from DB1 to DB2 (eg. 4.3.4 is activated in DB1 and reads the items table from DB2). This creates cross DB dependencies which break when a) one DB is offline (eg. for maintenance) or overloaded or b) you add database mirroring for HA/DR and one DB fails over. Try to make the code to work even if DB1 and DB2 are on different machines (and no linked servers). If necessary, add more info to the messages payload. And if you architect it that way that DB2 can be on a different machine and even multiple DB2 machines can exists to scale out the HTTP/PDF writing work.
And finally: this design will be very slow. I'm talking low tens messages per second slow, with so many dialogs/messages involved and everything with max_queue_readers 1. This may or may not be acceptable for you.

Queue stops (disables) without any poison message

I have a queue that stops without any aparently reason, in this queue i have implemented a posion message handling. And during processing, it records and discards any poison messages.
It has worked fine for more than a year without stopping. But recently (the problem began four weeks ago), it stops once or twice a week. And only in this week it stopped twice.
And when I check the table with the new poisoned messages, there is none!! And when I enable the queue, processing resumes successfully and the 'poison message' situation does not reproduce.
About the task of the queue: Receives about 2-3000 messages per day. It is used to run stored procedures outside the transaction. And each message can last a little to be processed (doing a lot of selects, inserts, updates).
Let me explain this point: the database has triggers that are fired inside a transaction, the trigger sends a message to run some code outside the trigger. The asynchronous behavior prevents droping the performance of the database.
I have detected that even when a dead-lock occurs while proccessing the messages, the queue treats the message as poisoned. So in principle it shouldn't be a performance problem. But, can it be? Maybe the database is growing and it lasts too long to proces a messages?
But how can I find it out if it is not detected as posioned?
Why other reason a queue stops?
How can save when and with which message the queue got disabled?
Does anybody has any idea how I can do any forensics analysis?
Any idea?
UPDATE EXPOSING A PSEUDO-SOLUTION:
According Remus' post, I've tried to use the event notification to get the exact moment when the queue stops.
CREATE EVENT NOTIFICATION [QueueDisabledEN]
ON QUEUE [dbo].[ProcessQueue]
FOR BROKER_QUEUE_DISABLED
TO SERVICE 'Queue Watch Service', 'current database';
And then checking the event log:
select * from sys.event_notificiation
But since it is difficult to know the environment in which the event occurred, (what else was running at the momment??), forensic analysis ends there. Fortunately my broker service implementation stores the messages with the date of shipment, the date of receipt, date processing, ... This has helped me to detect that within 3 seconds the queue is flooded with hundreds of messages that take too long to be processed.
While I find a real solution the only temporary solution is to check with an agent job every x minutes the status of the queue and enable it:
IF (EXISTS(SELECT * FROM sys.service_queues WHERE name like 'ProcessQueue' AND (is_receive_enabled = 0 OR is_enqueue_enabled = 0))) BEGIN
PRINT convert(nvarchar, getdate(), 121)+ ': Activando la cola ProcessQueue'
ALTER QUEUE ProcessQueue WITH STATUS = ON
END
Thanks Remus!
When you find the queue in disabled state and you enable back the queue, I assume that the processing resumes successfully and the 'poison message' situation does not reproduce. This would indicate that the cause is transient or time related. It could be a SQL Agent job that is running and causes deadlocks with the queue processing, forcing the queue processing to rollback. Deadlocks are in my experience the most typical poison message cause. Your best forensics tool is the system event log, as the activated procedure does output errors into the ERRORLOG and hence into the system Event Log.
Whenever a queue is disabled by the poison message trigger (5 consecutive rollbacks) an event notification of type QUEUE_DISABLED is fired. You can capture more forensic information in the handling this event, as it will run shortly after the moment the queue was disabled.
As a side note, you can never have true 'poison message handling'. Whenever you enhance the processing to handle some error cases, the definition of the 'poison message' changes to be the message capable of disabling the new error handling.

SQL Profiler cant catch Deadlock Graph Event

I am trying to resolve deadlocks. My Application gets deadlocks all the time when there is more then 10 users at the same time.
I have tried with SQL profiler and can't figure it out.
The thing is, in SQL Profiler I have checked to use the Deadlock Graph Event. But when I run the trace the event never got logged. I can see there are many Deadlocks and Deadlock Chains, but none Deadlock Graph. Please advice.
Thanks for help
You need to have only Locks->Deadlock graph selected if you want to see Deadlock graph event only.
When you run set up a filter for database name or database id, the DeadlockGraph event is not captured, even if you don't check "Exclude rows that don't check values".
If you filter for, say, Duration or NTUserName, which neither are populated by DeadlockGraph, the event is included (as long as you don't filter for the database, that is.)
Likewise, if you add LockAcquired and filter for DatabaseName (not populated by LockAcquired), the event is included.
So the problem is with this precise combination.
Refer:
https://connect.microsoft.com/SQLServer/feedback/details/240737/filtering-for-database-name-id-filters-out-deadlock-graph-when-it-shouldnt

SQL error: String or binary data would be truncated

I'm doing an integration on a community platform called Telligent. I'm using a 3rd-party add-on called BlogML to import blog posts from an XML file (in BlogML format) into my local Telligent site. The Telligent platform comes with many classes in their SDK so that I can programmatically add content, such as blog posts. E.g.
myWeblogService.AddPost(myNewPostObject);
The BlogML app I'm using essentially parses the XML and creates blog post objects then adds them to the site using code like the above sample line. After about 40 post imports I get a SQL error:
Exception Details: System.Data.SqlClient.SqlException:
String or binary data would be truncated.
The statement has been terminated.
I believe this error means that I'm trying to insert too much data into a db field that has a max size limit. Unfortunately, I cannot tell which field this is an issue for. I ran the SQL Server Profiler while doing the import but I cannot seem to see what stored procedure the error is occurring on. Is there another way to use the profiler or another tool to see exactly what stored procedure and even what field the error is being caused by? Are there any other tips to get more information about where specifically to look?
Oh the joys of 3rd-party tools...
You are correct in that the exception is due to trying to stuff too much data into a character/binary based field. Running a trace should definitely allow you to see which procedure/statement is throwing the exception if you are capturing the correct events, those you'd want to capture would include:
SQL:BatchStarting
SQL:BatchCompleted
SQL:StmtStarting
SQL:StmtCompleted
RPC:Starting
RPC:Completed
SP:Starting
SP:Completed
SP:StmtStarting
SP:StmtCompleted
Exception
If you know for certain it is a stored procedure that includes the faulty code, you could do away with capturing #'s 1-4. Be sure you capture all associated columns in the trace as well (should be the default if you are running a trace using the Profiler tool). The Exception class will include the actual error in your trace, which should allow you to see the immediate preceding statement within the same SPID that threw the exception. You must include the starting events in addition to the completed events as an exception that occurs will preclude the associated completed events from firing in the trace.
If you can filter your trace to a particular database, application, host name, etc. that will certainly make it easier to debug if you are on a busy server, however if you are on an idle server you may not need to bother with the filtering.
Assuming you are using Sql 2005+, the trace will include a column called 'EventSequence', which is basically an incrementing value ordered by the sequence that events fire. Once you run the trace and capture the output, find the 'Exception' event that fired (if you are using profiler, the row's it will be in Red color), then you should be able to simply find the most recent SP:StmtStarting or SQL:StmtStarting event for the same SPID that occurred before the Exception.
Here is a screen shot of a profile I captured reproducing an event similar to yours:
You can see the exception line in Red, and the line highlighted is the immediate preceding SP:StmtStarting event that fired prior to the exception for the same SPID. If you want to find what stored procedure this statement is a part of, look for the values in the ObjectName and/or ObjectId columns.
By doing some silly mistakes you will get this error.
if you are trying to insert a string like.
String reqName="Food Non veg /n";
here /n is the culprit.Remove /n from the string to get out of this error.
I hope this will help some one.

Resources