Constant disconnects due to channels going stale for no reason - google-app-engine

Ever since the latest release a few days ago, our users are constantly being disconnected due to channel tokens going stale with minutes of being created. Our tokens are set to last for 5 hours, but we're lucky if they last for 5-10 minutes and we cannot even reconnect with a new channel token when the channel closes until the user refreshes.
A Javascript error triggers the beginning of it. It looks like this:
NetworkError: 400 Unknown SID - http://89.talkgadget.google.com/talkgadget/dch/bind?VER=8&clid=C9C2EFC06C7C5163&gsessionid&prop=data&token=AHRlWrrWl611ZMMDw8Apgi5vdYuS9UslofxEiJI47-2n4rkPgmuu1z0AN-UNQcyNEvhck-AYAMSLPru8Aumooz62hYNNbLTbi1a3lTSAzGEyj6TsXZirJYE&RID=rpc&SID=BEBDEFDA92C6A9F7&CI=0&AID=54&TYPE=xmlhttp&zx=gsjg8mb1i987&t=1
Then, in Firefox Firebug, the console gets spammed infinitely with
channel name mismatch; message ignored
Until a refresh occurs.
Our site is a real-time interactive site with chat. Our users are sending us emails upset that they keep getting disconnected. They're leaving the site. This is costing us not only goodwill with our user base, but also money and we are powerless to do anything because the bug is with Google App Engine.
Please fix this or rollback to the previous build immediately until you figure this out. The latest build is broken.

I haven't been able to reproduce this but I'm still looking at it. In the meantime: if you explicitly call socket.close() after receiving the error, can you then create a new Channel object and reconnect? If that doesn't work, you could even try manually removing the element with id "wcs-iframe" itself from the DOM. You should be able to use the original token when doing this instead of fetching a new token.

Related

Expunged messages are not removed from trash, so expunge does not (always) permanently remove them

I have a client app which listens for added and removed messages from some mailboxes and folders.
There is the possibility to delete a message, I mean marking it as DELETED and expunging it.
Sometimes a user may enter in its web mail client and delete a message (I mean moving it in trash, I'm not talking about GMail which behaves differently)
When such an event happens, the first expunge called by code, returns an array of expunged messages, which contains those messages deleted from web client too, why?
If it is correct, the expected result shouldn't be that expunged messages were permanently removed from trash too? After the expunge messages removed by user via web client, exist yet in trash.
Is it normal? I am a bit confused about expunction...
Thanks

Instance delete: There is an operation pending for this application. Please wait and try again

One of my instances in GAE Standard (Java) is somehow in a strange state. Trying to delete it results in "There is an operation pending for this application. Please wait and try again" for a long time now. Is there any resolution for this short of redeploying a new version?
interesting:
Error mapping custom domain on Appengine: This guy has the same error with a different task, but also just now. Google status says everything is ok, but its an interesting coincidence.

Too many internal channel errors while using Channel API

We use Channel API in our Google App Engine application to send updates to our users. The code to send updates is something like this
for(String clientID: listOfClientID)
channelService.sendMessage(new ChannelMessage(clientID, stringMessage));
Over the past few weeks, we've been getting too many exceptions in this method. We get around around 150 exceptions for a 8-hour peak usage period.
com.google.appengine.api.channel.ChannelFailureException: An internal channel error occured.
The loop can have 500-3000 iterations. Is it a problem when ChannelService tries to send a message to a channel that has been closed? If I remove closed channels from the list, will it completely solve the problem? Please note that this large number of exceptions have been a trend only in the past few weeks and we've been using Channel API for several months.
Turns out that the problem was the server trying to send messages to channels that have expired. The error rate has gone down considerably when I made sure that doesn't happen anymore.

Channel API channel gets disconnected without onclose or onerror calls. JavaScript console has logs of failed HTTP calls to talkgadget.google.com

I have implemented Google App Engine's Channel API feature in my application. Everything runs smoothly. I create new channels every one hour for every user. I have managed to maintain one channel per session (same channel for different tabs in a browser). I have implemented the onerror and onclose methods in such a way that every time they are invoked, a call is made to the server requesting for a valid token.
Sometimes, after the channel's been alive for a while, it gets disconnected. I can see failed HTTP calls to talkgadget.google.com on the JavaScript console. The URLs are something like this:
https://129.talkgadget.google.com/talkgadget/dch/bind?VER=8&clid=.....
These calls have responses like "401 (Token timed out)" or "401 (Token invalid)".
Which is indeed true, the token used by the client is invalid. It should get updated with the new token but the onerror or onclose methods aren't invoked. How am I supposed to figure out when this would happen or how to handle it? There is no real way to say if a client is disconnected or not except for the onerror or onclose methods. This issue is resolved if I refresh the page (I get the valid token from database every time the user refreshes).
I checked the socket objects's "readyState" property and it had the value 1. There are many who face this issue and as of date, there seems to be no valid solution offered by the folks at GAE.
Edit: I'm a premium account holder and this issue is holding back our deployments.
Edit 2: Having one channel per tab reduces the frequency of this happening. But it doesn't solve the problem completely.
It has been six days since I posted the question and there has been no response from the AppEngine team or any other users.
The workaround I applied was to have a button on the site that would fetch the (valid) token from the database, close the channel and then open it again with the token received.
Sometimes its a new token which should've been received before, sometimes its the same token that had been valid all along.
This issue cannot be replicated often I agree, but when it happens, it causes a lot of damage. I hope I find a solution soon.
Edit: Having one channel per tab reduces the frequency of this happening. But it doesn't solve the problem completely.

Intermittent error code 400, description "" on client connecting to channel

My Google App Engine app, which uses the Channel API works well some of the time. Intermittently, though, the js code connecting to the channel generates an error. In socket.onError, the error code is set to 400 and the description is set to an empty string. I have checked that the token being used to connect is valid. I also tried recreating the channel in socket.onError, by first calling socket.close() but that does not seem to work. Often there is a series of failures before a success. The client js is running on Safari on iOS. Any ideas on how to fix or work around the problem will be welcome. Right now, my best workaround is to keep trying till I succeed, increasing the interval between attempts on each failure. The server side presence API does not help, since the 'connected' hook is not called reliably.
It is known issue http://code.google.com/p/googleappengine/issues/detail?id=4940 and it was accepted. As you see the status of issue is not fixed. Feel free to star it.
I know double posting is bad (issue starred & comment posted)... but I suspect this thread might get more attention than the issue comments ^^
As far as we are concerned, it's at the very least a documentation issue:
https://developers.google.com/appengine/docs/java/channel/javascript still
states " An onerror call is always followed by an onclose call and the channel object will have to be recreated after this event"
It is only true for, as far as we have guessed, error codes 400 and 401 (which are strings, not numbers, btw, so beware of === in the js code).
It is untrue for other error codes (we have logged at least the -1 code).
There should be a documentation covering all error codes and their (expected) management.
Atm, we have a "channel manager" that reuses the same channel token when code is not 400 or 401, and that makes sure onclose is called once and once only per Socket.
Before that, we were trying to close properly, and reopen (new underlying Socket) with a shiny brand new token: usually we got an error 400 followed by an error -1.
FUI we first detected this behavior on iOS, quite recently (regression ftw? Before that iOS was dandy). Reopening the socket after a code -1 is not a panacea: sometimes it will succeed (onopen properly called), and then fail silently (no message received, no onerror called).
Generally, we also noticed more consistent behavior on desktop browsers than mobile ones, across all user agents and platforms (more on that: yay! Other issues incoming! Especially android...)
OK, this post might have been useful after all. Thx!
[EDIT: corrected a mistake... we don't reuse the channel object nor the socket object, only the token]
I contacted Google support about this issue.
When a error 400 happens it's because a timeout (one minute it seems) happened. This timeout generates a disconnection (url disconnected is called and you should remove the client id of the database).
Then, a new channel must be created with a new client id.
But it is not enough. We have to use this jquery command line : $('#wcs-iframe').remove();
Just inside the js onerror function and before to try to recreate the channel.

Resources