Service Broker error handling simulation - sql-server

I work now in project in which multiple POSes should be synchronized to main server by using Server Broker feature. Now i prepare error handling for this solution and want to show to client how it works. That means i will prepare test scripts for every kind of errors and client runs it on test POS to see if it errors processed correctly.
We will use SQL Server 2008R2 with poison message = OFF.
Message type=XML (but inside can be different type of data, some nodes will contain BLOBs).
POSes will be outside of domen so transport will be secured (but no dialog encryption).
I divide errors on several sub-groups:
Logical error (e.g. string instead
of number) .It will be processed by
TRY-CATCH block on server side.It is
easy to simulate
Service Broker configuration error
(message or will be not returned or
cannot reach destination). I think
it can be handled by using SQl
Server Service Broker events and
simulation will be some kind of "bad
configuration" (SB GUID,service name
etc)
Transport error. This is when we
have a broken message. In fact it is
client opinion to test such kind of
error. I do not know if we have
secured transport level
(certificate) we are protected from
such kind of error. Another question
how can I simulate this.
Questions:
are there another error types?
is #2 error handling logic described good enough?
how to handle and simulate #3?

The second part of my article here goes into a discussion of Service Broker errors, how they occur and how to handle them. The important thing for you is to distinguish between two categories of errors:
recoverable: transport problems, most configuration errors like bad routing, an unreachable server. All these will result not in a SSB error, but in a delay. Messages will stay in transmission_queue expecting that the problem is transient and can be solved, including some configuration problems. Once the problem is solved, SSB will retry and the message gets delivered.
unrecoverable: these are problems SSB deems as non-recoverable, eg. a bad message format. In such a case the conversation will be aborted and both endpoints receive a Error message.
I also have an article Error Handling in Service Broker procedures that discusses some of the topics particular to exception handling in SSB activated context.
A final note: I strongly discourage you from turning poison message detection OFF. It is much better to disable the processing than to spin ad-nauseam w/o making progress because of a poison message.
As on the topic on how to simulate a corrupted message: is hard to simulate (you can try with setting up a port forwarder that lets all traffic pass by, but randomly corrupts some of it) but is rather pointless. All SSB traffic, even when in clear text, is cryptographically signed and any message corruption would result in an abrupt disconnect due to message signing validation failure.

Related

Google PubSub Simultaneous Publish Requests

In Google PubSub, the publish call from the client can be called asynchronously. Because of this, I would think that it would be possible to have multiple publish requests triggered and sent to the server, all at the same time, especially if the batch thresholds are too low.
If this is true, how does the pubsub client control the number of simultaneous publish requests that can be created? Is there a hard limit, or an error that can occur if too many requests are created? Is this the intended use of having an asynchronous publisher, or is simply to allow for other non-publishing activity to occur?
Though this question applies to any of the clients, we are specifically having an issue with the C# client, and are intermittently receiving the following error:
Grpc.Core.RpcException: Status(StatusCode=DeadlineExceeded, Detail="Deadline Exceeded")
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<<WithRetry>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
My thought is that we are sending too many publish requests..., but I am not sure.
I would advice using the raw gRPC code, but use the client library that has a very thin wrapper.
Looking at the client source code always helps me, you can find for c# code here PublisherClient.cs (thin wrapper)
If you are using PublishAsync it queues/batch the messages anyway, the behaviour is controlled the settings you give to the client (see PublisherServiceApiClient for how to tune it). You can also control the number of client connections that are used to send the queues in the client. I suggest playing with the batch-size first, then the number of connections till you found your sweet spot for your throughput.

GAE custom Go runtime - internal.flushLog error

I have recently changed to use custom Go runtime on GAE, and noticed many errors like this from logs:
internal.flushLog: Flush RPC: Call error 3: invalid security ticket: 6c8027dc99b3ed3e
internal.flushLog: Flush RPC: Canceled: (timeout)
The server is still running well, but I have no idea about that error, as well as why it happens.
I'm using a custom Go runtime by using Dockerfile, and App Engine Release is 1.9.37.
Any help to clarify the error would be highly appreciated. Thanks.
This is a known issue with the Go runtime on App Engine Flexible. It tends to happen when a line is logged right before the end of a request/response.
What happens is that when the line is logged it is actually put in a list of log lines to be batched together and sent to the application server as an RPC at periodic intervals. The security ticket is canceled at the end of a request/response which sometimes can happen before the log lines have been flushed. It's harmless, except that you may lose a log line or two. :\
We're actively working on fixing it.

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

App engine channel successfully created but unusable

I have been getting an intermittent problem using the App Engine channel API. For the most, maybe 90% of the time, everything works fine. But the remaining 10% of the time I get a channel that is unusable. Having looked at this code for months, I strongly believe that this problem is not due to a logical error. By unusable channel I mean that even though the client connects to it successfully, the server is not able to message it. Most of the operations involved on the client and server complete successfully:
On the server, I create a channel with a new client id unique to the session
The client fetches the corresponding token and connects to it
On the client, onOpen() is called on the channel socket
The one thing that doesn't succeed is the calling of /_ah/channel/connected for these defective channels. I've tried dozens of possible workarounds without success. Right now I deal with the problem by gracefully retrying till I succeed, but it would be really nice for it to work without these tricks.
I havent seen any code but from what you are saying could it be related to
Intermittent error code 400, description “” on client connecting to channel
I am using a kind of brute force loop messaging to all client sockets (even if they have been closed, its a bit redundant but the overhead seems low ) and haven't picked up any problems yet (I also havent tested it that well either)
It seems that they fixed a leak in the channels APi in the last release 1.8.2: https://code.google.com/p/googleappengine/issues/detail?id=9283
https://code.google.com/p/googleappengine/wiki/SdkForJavaReleaseNotes

SQL Server Service Broker Service Disappearing (Automatically Deleted)?

I've implemented a messaging system over SQL Server Service Broker. It is working great, with the sole exception that every once in a while (maybe once per week per server) my initiator service just vanishes without a trace. The corresponding queue is still there, but the service is missing.
Obviously this causes problems in my system. It's a simple matter to recreate the service by hand, but I'm confused as to what might cause this behavior. I understand that automatic poison message handling causes queues to be disabled, but I don't see anything that indicates services can be disabled or deleted automatically.
When this happens, I usually have a large backlog of messages in multiple application queues, but nothing extreme. Total message backlog is around 200,000.
Does anyone know what might be happening here?
You must have a bug of some sort that issues a DROP SERVICE statement. That is the only way a service gets deleted.
Check the default trace, the DROP statement gets traced and saved into it so you can track down the application/user/statement that issues the DROP. Check sys.traces to find the location of the default trace then open the .TRC file in Profiler.

Resources