App engine channel successfully created but unusable - google-app-engine

I have been getting an intermittent problem using the App Engine channel API. For the most, maybe 90% of the time, everything works fine. But the remaining 10% of the time I get a channel that is unusable. Having looked at this code for months, I strongly believe that this problem is not due to a logical error. By unusable channel I mean that even though the client connects to it successfully, the server is not able to message it. Most of the operations involved on the client and server complete successfully:
On the server, I create a channel with a new client id unique to the session
The client fetches the corresponding token and connects to it
On the client, onOpen() is called on the channel socket
The one thing that doesn't succeed is the calling of /_ah/channel/connected for these defective channels. I've tried dozens of possible workarounds without success. Right now I deal with the problem by gracefully retrying till I succeed, but it would be really nice for it to work without these tricks.

I havent seen any code but from what you are saying could it be related to
Intermittent error code 400, description “” on client connecting to channel
I am using a kind of brute force loop messaging to all client sockets (even if they have been closed, its a bit redundant but the overhead seems low ) and haven't picked up any problems yet (I also havent tested it that well either)

It seems that they fixed a leak in the channels APi in the last release 1.8.2: https://code.google.com/p/googleappengine/issues/detail?id=9283
https://code.google.com/p/googleappengine/wiki/SdkForJavaReleaseNotes

Related

Connection tab of Google Cloud SQL instance taking forever to load the console interface

I want to access my cloud database from my computer but the connection tab cannot load to finish so that I can enter my IPv6 address. This is the second time am experiencing this issue and my network is strong enough. It's now been 20 minutes, but still the three dots are just indicating progress that never ends.
The first time it happened I had to leave my computer and go for a walk. This really frustrates me since it's in production and rapid updates should not be delayed.
How can I fix this?
POSSIBLE CAUSE:
It happens after I re-open Mysql-workbench and it fails reason being my IPv6 has been changed possibly by my Internet Service Provider (ISP) (I dont know of other possible reasons). After Mysql-workbench fails, I go to the console to update the new one but this problem occurs.
I think Cloud SQL security (don't know exact name) is treating this a malicious access attempt hence initiating this weird delay for immediate subsequent access. If so, then this is purely impractical for b/s since my computer does not tell me that my IPv6 has changed, besides, that normal regular IPv6 updates can't be treated as malicious lest developers continue to suffer this issue.
EDIT: This time it finished loading after approximately 50 minutes.
Have you considered using the Cloud SQL proxy to connect to your instance instead of white-listing an IP? White-listing an IP can be insecure since it provides anyone on your network access, and inconvenient (as you have discovered) because if your IP changes you lose access.
The proxy uses a service account to provide authenticated access to your instance, so it will work regardless of your IP (as long as your service account has the correct permissions). Check out these instructions for a guide on starting it up.
(As a side note, it's a difficult problem to tell why your connectivity tab is failing to get load. It might be a browser add on or even a networking failure in your local network that is interfering. You can check the browser dev console to see if any errors appear)

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

OData HTTP400 Timeout Error

This is one of the most bizarre problems I've come across since I started using OData for my mobile apps. The OData server I've developed is backed by SQL Express 2008 and this combination has been installed on 50 different servers and/or PCs over the last 15 months. All 50 servers have been running stable with consistent function for large amounts of data.
A couple of days ago one of my clients contacted me indicating that my client app (running on iOS7) was having an odd error come up when POSTing data to their server. The error had an HTTP code of 400 and the error text is "The operation couldn't be completed. (Timeout error 400.)". My first question is: why is a timeout error coming back with a 400 code? Generally when I get timeouts (due to firewall, etc) they're in the 100x range. There is no indication in the event logs on the server of ANY problems occurring. My own logs (stored in the SQL database) show no error (which is odd because I'm using the generic exception catching method in my OData service to log any problems). I haven't got to the step of adding logging of all requests as yet.
The error is only being raised when posting one particular set of data. All other posts from the device are functioning perfectly. I got the client to re-install the app (deleting all data) and then to download the data set that was causing the error. The download worked fine. We began making changes to the data to replicate what the data looked like when the error occurred in incremental changes, posting the change to the server and observing the result. Most of the incremental changes work fine but certain combinations cause the error to occur. One of the increments involves a large volume of changes and that posts fine, but subsequent alteration of any of the objects (sometimes altering as little as 6 characters in a text field) cause the error to occur. And yet in some circumstances altering objects that have already been posted to the server works without a problem.
I wiped the service components from the server and undertook a fresh install. I shifted TCP ports in case 443 had another listener causing problems. I reset the server. None of these change the behaviour of the error.
My last ditch solution is to completely re-install IIS and .NET Framework but I'd obviously like to avoid this as it's not my server... The server is overseas from my current location so debugging isn't really an option. Hoping someone has an idea as to what I can do diagnostically to try and determine the source of this bizarre 'gremlin'.
Have you tried a more thorough traffic analysis using a tool like Fiddler? The "timeout" error does indeed seem odd and what stood from you post was that your server was "overseas". Could there be something with the "times" that are being used/generated, e.g. server time, local time, etc?
Just to confirm, the "same" exact set of data always fails? Can you replicate this via a remote debugger or via localhost? If so, can you turn on "verbose errors"?

Detecting client-aborted requests in AppEngine

I have a request in AppEngine that takes a little while to complete (many seconds). Is there a way to detect whether the user or some network problem has already aborted the request? This would allow me to save myself the server-load of continuing the result generation, which won't go anywhere anyways.
I tried the following in Dev-Mode, but neither worked (haven't checked yet whether it behaves differently in production mode):
Checking whether resp.getOutputStream completes without throwing an IOException
Checking whether there was an Interrupt sent to the servlet thread
Thanks, Markus
PS: I am really specifically interested in this question, not in ways to restructure my app to make the request faster or prevent aborts or other things.
I don't know if that is possible at all on the App Engine, app engine doesn't allow in progress request. The response is sent to the client after that the handler/servlat has returned.
No, there is no way to detect this from inside the app. I wouldn't worry about it.
Way late but this may be useful. In Golang you can detect interrupts using the Context package.
Here is a useful video of Francesc Campoy explaining it:
https://www.youtube.com/watch?v=LSzR0VEraWw

Force Channel API to poll

Hopefully Moishe sees this: in development mode, the channel api client (javascript) resorts to polling... and uses a very fast polling rate. After poking around I found that if I set
goog.appengine.Socket.POLLING_TIMEOUT_MS = interval;
I can control the polling rate. What I'm wondering is:
How do I know if/when the client is going to go into "poll mode" in production?
Is it possible to force the client into "poll mode"?
What happens if I reach the channel quota for my app? will the /_ah/channel/ endpoint just stop working altogether? or will it resort to polling?
-Thank you
Answers:
The client will never go into polling mode in production. The implementation is completely different in prod.
See above
The call to create_channel() will fail and you won't be able to get any more tokens. Existing tokens (and hence channels) will work until they time out.
Hope that helps!
-Moishe

Resources