SQL Server Service Broker Service Disappearing (Automatically Deleted)? - sql-server

I've implemented a messaging system over SQL Server Service Broker. It is working great, with the sole exception that every once in a while (maybe once per week per server) my initiator service just vanishes without a trace. The corresponding queue is still there, but the service is missing.
Obviously this causes problems in my system. It's a simple matter to recreate the service by hand, but I'm confused as to what might cause this behavior. I understand that automatic poison message handling causes queues to be disabled, but I don't see anything that indicates services can be disabled or deleted automatically.
When this happens, I usually have a large backlog of messages in multiple application queues, but nothing extreme. Total message backlog is around 200,000.
Does anyone know what might be happening here?

You must have a bug of some sort that issues a DROP SERVICE statement. That is the only way a service gets deleted.
Check the default trace, the DROP statement gets traced and saved into it so you can track down the application/user/statement that issues the DROP. Check sys.traces to find the location of the default trace then open the .TRC file in Profiler.

Related

Connection tab of Google Cloud SQL instance taking forever to load the console interface

I want to access my cloud database from my computer but the connection tab cannot load to finish so that I can enter my IPv6 address. This is the second time am experiencing this issue and my network is strong enough. It's now been 20 minutes, but still the three dots are just indicating progress that never ends.
The first time it happened I had to leave my computer and go for a walk. This really frustrates me since it's in production and rapid updates should not be delayed.
How can I fix this?
POSSIBLE CAUSE:
It happens after I re-open Mysql-workbench and it fails reason being my IPv6 has been changed possibly by my Internet Service Provider (ISP) (I dont know of other possible reasons). After Mysql-workbench fails, I go to the console to update the new one but this problem occurs.
I think Cloud SQL security (don't know exact name) is treating this a malicious access attempt hence initiating this weird delay for immediate subsequent access. If so, then this is purely impractical for b/s since my computer does not tell me that my IPv6 has changed, besides, that normal regular IPv6 updates can't be treated as malicious lest developers continue to suffer this issue.
EDIT: This time it finished loading after approximately 50 minutes.
Have you considered using the Cloud SQL proxy to connect to your instance instead of white-listing an IP? White-listing an IP can be insecure since it provides anyone on your network access, and inconvenient (as you have discovered) because if your IP changes you lose access.
The proxy uses a service account to provide authenticated access to your instance, so it will work regardless of your IP (as long as your service account has the correct permissions). Check out these instructions for a guide on starting it up.
(As a side note, it's a difficult problem to tell why your connectivity tab is failing to get load. It might be a browser add on or even a networking failure in your local network that is interfering. You can check the browser dev console to see if any errors appear)

Performing the synchronization with ExecuteOfflineCommand more effectively

I'm wondering is there a way to recognize the OfflineComamd is being executed or internal flag or something to represent this command has been passed or mark it has been executed successfully. I have issue in recognizing the command is passed or not with unstable internet. I keep retrieve the records from database and comparing each and every time to see this has been passed or not. But due to the flow of my application, I'm finding it very difficult to avoid duplicates.IS there any automatic process to make sure commands executed automatically or something else?
2nd question, I can use UITimer to check isOffline() to make sure internet is connected or not on the forms. Is there something equivalent on server page or where queries is written to see internet is disconnected or not. When the control moved to queries and internet is disconnected I see the dialog open from form page being frozen for unlimited time and will not end. I have to close and re-open the app to continue the synchronization process.At the same time I cannot set a timeout for dialog because I'm not sure how long it will take the complete the Synchronization process. Please advise.
Extending on the same topic but I have created a new issue just to give more clarity on my questions.
executeOfflineCommand skips a command while executing from storage on Android
There is no way to know if a connection will stay stable as it requires knowledge of the future. You can work like transaction services do where the server side processes an offline command as a transaction using the approach of 2-phase commit.
In this approach you have an algorithm similar to this:
Client sends command to server
Server returns a special unique ID for the command
Client asks server to perform the unique id
Server acknowledges that the command was performed
If the first 2 stages didn't complete you just do that again. The worst thing that could happen is some orphan commands on the server.
If the 3rd option didn't complete you just do it again. The server knows whether it processed the command and will just acknowledge it if it was already processed.

OData HTTP400 Timeout Error

This is one of the most bizarre problems I've come across since I started using OData for my mobile apps. The OData server I've developed is backed by SQL Express 2008 and this combination has been installed on 50 different servers and/or PCs over the last 15 months. All 50 servers have been running stable with consistent function for large amounts of data.
A couple of days ago one of my clients contacted me indicating that my client app (running on iOS7) was having an odd error come up when POSTing data to their server. The error had an HTTP code of 400 and the error text is "The operation couldn't be completed. (Timeout error 400.)". My first question is: why is a timeout error coming back with a 400 code? Generally when I get timeouts (due to firewall, etc) they're in the 100x range. There is no indication in the event logs on the server of ANY problems occurring. My own logs (stored in the SQL database) show no error (which is odd because I'm using the generic exception catching method in my OData service to log any problems). I haven't got to the step of adding logging of all requests as yet.
The error is only being raised when posting one particular set of data. All other posts from the device are functioning perfectly. I got the client to re-install the app (deleting all data) and then to download the data set that was causing the error. The download worked fine. We began making changes to the data to replicate what the data looked like when the error occurred in incremental changes, posting the change to the server and observing the result. Most of the incremental changes work fine but certain combinations cause the error to occur. One of the increments involves a large volume of changes and that posts fine, but subsequent alteration of any of the objects (sometimes altering as little as 6 characters in a text field) cause the error to occur. And yet in some circumstances altering objects that have already been posted to the server works without a problem.
I wiped the service components from the server and undertook a fresh install. I shifted TCP ports in case 443 had another listener causing problems. I reset the server. None of these change the behaviour of the error.
My last ditch solution is to completely re-install IIS and .NET Framework but I'd obviously like to avoid this as it's not my server... The server is overseas from my current location so debugging isn't really an option. Hoping someone has an idea as to what I can do diagnostically to try and determine the source of this bizarre 'gremlin'.
Have you tried a more thorough traffic analysis using a tool like Fiddler? The "timeout" error does indeed seem odd and what stood from you post was that your server was "overseas". Could there be something with the "times" that are being used/generated, e.g. server time, local time, etc?
Just to confirm, the "same" exact set of data always fails? Can you replicate this via a remote debugger or via localhost? If so, can you turn on "verbose errors"?

Service Broker error handling simulation

I work now in project in which multiple POSes should be synchronized to main server by using Server Broker feature. Now i prepare error handling for this solution and want to show to client how it works. That means i will prepare test scripts for every kind of errors and client runs it on test POS to see if it errors processed correctly.
We will use SQL Server 2008R2 with poison message = OFF.
Message type=XML (but inside can be different type of data, some nodes will contain BLOBs).
POSes will be outside of domen so transport will be secured (but no dialog encryption).
I divide errors on several sub-groups:
Logical error (e.g. string instead
of number) .It will be processed by
TRY-CATCH block on server side.It is
easy to simulate
Service Broker configuration error
(message or will be not returned or
cannot reach destination). I think
it can be handled by using SQl
Server Service Broker events and
simulation will be some kind of "bad
configuration" (SB GUID,service name
etc)
Transport error. This is when we
have a broken message. In fact it is
client opinion to test such kind of
error. I do not know if we have
secured transport level
(certificate) we are protected from
such kind of error. Another question
how can I simulate this.
Questions:
are there another error types?
is #2 error handling logic described good enough?
how to handle and simulate #3?
The second part of my article here goes into a discussion of Service Broker errors, how they occur and how to handle them. The important thing for you is to distinguish between two categories of errors:
recoverable: transport problems, most configuration errors like bad routing, an unreachable server. All these will result not in a SSB error, but in a delay. Messages will stay in transmission_queue expecting that the problem is transient and can be solved, including some configuration problems. Once the problem is solved, SSB will retry and the message gets delivered.
unrecoverable: these are problems SSB deems as non-recoverable, eg. a bad message format. In such a case the conversation will be aborted and both endpoints receive a Error message.
I also have an article Error Handling in Service Broker procedures that discusses some of the topics particular to exception handling in SSB activated context.
A final note: I strongly discourage you from turning poison message detection OFF. It is much better to disable the processing than to spin ad-nauseam w/o making progress because of a poison message.
As on the topic on how to simulate a corrupted message: is hard to simulate (you can try with setting up a port forwarder that lets all traffic pass by, but randomly corrupts some of it) but is rather pointless. All SSB traffic, even when in clear text, is cryptographically signed and any message corruption would result in an abrupt disconnect due to message signing validation failure.

Site update, testing was fine, after deployment, again fine, once user load increases, FAIL?

We are using ASP.NET MVC with LINQ to SQL. We added some features and tested them all to perfection on our QA box. We are using Windows Server 2003 and SQL Server 2005. So when we pushed out changes to the Live web server we also used Red Gate SQL Compare to push new database changes to the LIVE database. We tested again between the few of us, no problems. Time for bed.
The morning comes and users are starting to hit the app, and BOOM. We have no idea why this would happen as we have not been doing any new types of code things that we were not doing before. However we did notice that during the SQL Compare sync the names of all the foreign keys were different between the two databases, not the IDs in the tables, FK_AssetAsset_A0EB67 to FK_AssetAsset_B67EF8 (for example, don't remember the exact number of trailing mixed characters during the SQL Compare), we are not sure why but that is another variable in this problem.
Strangely once this was all pushed out we could then replicate the errors on QA, but not before everything was pushed to LIVE.
QA and LIVE databases are on the same SQL Server, but the apps are on different instances of Windows Server 2003.
Errors generated:
Index was outside the bounds of the array.
Invalid attempt to call FieldCount when reader is closed.
Server failed to resume the transaction.
There is already an open DataReader associated with this Command which must be closed first.
A transport-level error has occurred when sending the request to the server.
A transport-level error has occurred when receiving results from the server.
Invalid attempt to call Read when reader is closed.
Invalid attempt to call MetaData when reader is closed.
Count must be positive and count must refer to a location within the string/array/collection. Parameter name: count
ExecuteReader requires an open and available Connection. The connection's current state is connecting.
Any one have any idea what the heck could have happened?
EDIT: Since we were able to replicate the errors all of a sudden on QA, it might not be a user load issue... Needless to say we all feel really screwed here.
Concurrency always brings bugs out of the woodwork. I'd recommend you check for objects that could be shared among requests (such as static members and singletons) and refactor your code so that as little as possible is shared.
As far as specifics go, for the error "There is already an open DataReader associated with this Command which must be closed first," you may want to try adding MultipleActiveResultSets=True to your connection strings.
It sounds like you're crossing the streams a bit and trying to share DataContexts across requests. My suggestion would be to wire in a dependancy injection framework that creates a new instance of the dependancy for each request.
I use Castle's IoC and wire it into the controller factory so that when it sees a dependancy on a repository it creates a new instance of that repository for each request. If you go this route let me know and I can shoot you a few more resources.

Resources