Random 500 errors on AppEngine - google-app-engine

I have a fairly big application which went over a major overhaul.
The newer version uses lot of JSONP calls and I notice 500 server errors. Nothing is logged in the logs section to determine the error cause. It happens on JS, png and even jersey (servlets) too.
Searching SO and groups suggested that these errors are common during deployment. But it happens even after hours after deployment.
BTW, the application has become slightly bigger and it even causes deadline exception while starting few instances in few rare cases. Sometimes, it starts & serves within 6-10secs. Sometimes it goes to more than 75secs thereby causing a timeout for the similar request. I see the same behavior for warmup requests too. Nothing custom is loaded during app warmup.

I feel like you should be seeing the errors in your logs. Are you exceeding quotas or having deadline errors? Perhaps you have an error in your error handler like your file cannot be found, or the path to the error handler overlaps with another static file route?
To troubleshoot, I would implement custom error pages so you could determine the actual error code. I'm assuming Python since you never specified what language you are using. Add the following to your app.yaml and create static html pages that will give the recipient some idea of what's going on and then report back with your findings:
error_handlers:
- file: default_error.html
- error_code: over_quota
file: over_quota.html
- error_code: dos_api_denial
file: dos_api_denial.html
- error_code: timeout
file: timeout.html
If you already have custom error handlers, can you provide some of your app.yaml so we can help you?

Some 500s are not logged in your application logs. They are failures at the front-end of GAE. If, for some reason, you have a spike in requests and new instances of your application cannot be started fast enough to serve those requests, your client may see 500s even though those 500s do not appear in your application's logs. GAE team is working to provide visibility into those front-end logs.

I just saw this myself... I was researching some logs of visitors who only loaded half of the graphics files on a page. I tried clicking on the same link on a blog that they did to get to our site. In my case, I saw a 500 error in the chrome browser developer console for a js file. Yet when I looked at the GAE logs it said it served the file correctly with a 200 status. That js file loads other images which were not. In my case, it was an https request.
It is really important for us to know our customer experience (obviously). I wanted to let you know that this problem is still occurring. Just having it show up in the logs would be great, even attach a warm-up error to it or something so we know it is an unavoidable artefact of a complex server system (totally understandable). I just need to know if I should be adding instances or something else. This error did not wait for 60 seconds, maybe 5 to 10 seconds. It is like the round trip for SSL handshaking failed in the middle but the logs showed it as success.
So can I increase any timeout for the handshake or is that done on the browser side?

Related

Intermittent authorization failure on publish

I am seeing an odd intermittent authorization failure on publish. My publisher is running on App Engine Standard (Python). Because of that, I am using the "old" python client library. So the code looks like this:
from googleapiclient.discovery import build
build('pubsub','v1').projects().topics().publish(topic=topic,body=body).execute()
This works just fine. The identity gets picked up and everything is authenticated. However, again intermittently, it will stop working and I get 403 forbidden errors. Then later it will start working again (even with the same topic and body). In the meantime, no code changes, no deployments.
I have had to wrap the publish to catch this error, throw it on a task queue and have the request repeat with decaying frequency until it finally starts working again a few hours later. This is OK in the very short term, but obviously this will not work for us.
To summarize, this is on the publish side, GAE Standard ... it works, then stops working, then works again.
Thanks for any insight or help.
It turns out, of course, that in fact there were deployments when I wasn't aware. So I thought, "no code change - no deployments", but there were deployments. And the issue was that the person making these deployments had an old library (or other dependency) for google_api_python_client. Once corrected, pubsub is working just fine.

Server gives 503 on almost all images

I have a Wordpress site there is suddenly getting extremely slow - special on mobile devices. I am often getting this error:
Service Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Additionally, a 503 Service Unavailable error was encountered while trying to use an ErrorDocument to handle the request.
I checked in my server log and can there is a huge amount of 503 errors:
As I understand Wordpress, then everytime you upload a picture there is generated different screensizes. I was looking on my ftp server, and can see each picture is generated with the below sizes:
Kajak-i-Norge.jpg
Kajak-i-Norge-60x40.jpg
Kajak-i-Norge-60x40#2x.jpg
Kajak-i-Norge-100x100.jpg
Kajak-i-Norge-120x80.jpg
Kajak-i-Norge-150x150.jpg
Kajak-i-Norge-300x200.jpg
Kajak-i-Norge-300x300.jpg
Kajak-i-Norge-394x263.jpg
Kajak-i-Norge-394x303.jpg
Kajak-i-Norge-394x394.jpg
Kajak-i-Norge-394x400.jpg
Kajak-i-Norge-600x400.jpg
Those 503 errors does that mean the picture is not found on the server?
There are 3 reasons i think are producing this problem
1 - Something wrong with hosting , especially if your it is shared
2 - Someone trying to do a denial of service
3 - You have a lot of viewers in your server that the server is starting to get gradually slower
The 503 error means that the server can't serve up the image, not that it's not found on the server. 404 errors are not found. The fact that you're seeing both of these errors on your images is most likely the result of server resources being overloaded (as other answers have mentioned)...either capacity or bandwidth. If you're on shared hosting, this can happen often. Your site loaded fine for me just now, but of course, that can change the second your server gets bogged down.
A free cloudflare account is a good recommendation from Dan Web above...it will at least rule out the chance it's spam bots hogging up your resources.
Intermittent issues like this are almost always shared hosting problems. Your host will also rarely accept responsibility for the issues, and you're let with having to decide to move your site to a different hosting solution.

Chrome:POST/OPTIONS requests Fails with net::ERR_TIMED_OUT

The OPTION/POST Request is failing inconsistently with a console Error as err_timed_out. We get the issue inconsistently, it's only observed sometimes. Otherwise the request gets proper response from the back end. When it's timing out, the request doesn't even reach the server.
I have done some research on the stuff and found that due to maximum 6 connections to a resource restrictions it may wait for getting a connection released. But, I don't see any other requests pending ,all the other requests were completed.
In the timeline I can always see that it stalled for 20.00 seconds. Most of the time the time is same. But, it only shows that its been stalled for some time nothing else in the timeline.
The status of the request shows failed ERR_Connection_Timed_Out. Please help.
The Network Timing
Console Error
I've seen this issue when I use an authenticated proxy server and usually a refresh of the page fixes it.
Are you using an authenticated proxy server where you are seeing this behavior? Have you tried on a pc with direct access (i.e. without proxy) to the Internet?
I've got the same problem when I choose another ISP. I thought I would have only to put my new ID and password, but it wasn't the case.
I have an ADSL modem with a dry loop.
All others services were fine (DNS resolution, IP telephony, FTP, etc).
I did a lot of tests (disable firewall, try some others navigator, try under Linux, modem default factory, etc) none of those tests were successful.
To resolve the problem ERR_TIMED_OUT, I had to adjust the MTU and MRU values. I put 1458 rather than 1492, which is the default value.
It works for me. Maybe some ISPs use different values. Good luck.

Google App Engine + Jersey + IntelliJ Idea

Okay so I'm having some annoying issue here. I am clearly missing something, and already been frustrating myself with this for the last 4-5 hours.
I cannot seem to be able to create a simple GAE+Jersey up and running with IntelliJ.
The frustrating part is that I do not get any errors, just that the REST service is not present after deployment.
I've tried out a few basic stuff, basically everything you can find on Google (and here).
After some frustration I've downloaded this project: https://github.com/BluerockInteractive/GAE-Jersey-Guice-Sample
Just cause this will list out the available servers, plus the deeper logging level.
Now here's the output of the AppEngine startup:
https://docs.google.com/open?id=0B42XvjSlpDCtTTdwQl9MSTBlQ0U
So my problem: The AppEngine is up and running. If I create a basic servlet everyting works fine. However, I cannot get Jersey to work. In every form I could try it always throws an
HTTP ERROR 404
Problem accessing /jerseyguicesample. Reason:
NOT_FOUND
Powered by Jetty://
Any ideas, what I could be missing here?
Here's an image of the IDE and the artifact settings: http://img51.imageshack.us/img51/7913/20120428124133.png
Thanks, :D
This example has REST handlers mapped to /rest/test, /rest/players/{name}/xml, /rest/players/{name}/json and /rest/hello.
What (and where) did you map /jerseyguicesample to?

Silverlight WSOD at remote client site

I have a client who installed our silverlight app recently. it works fine from their server itself. but when they try to run it from a client, they can log in through the aspx login page, but on the main page, which hosts the tag and the .xap file, they see absolutely nothing!
I cannot see their screens, just get occasional screenshots via email and cross my fingers that they are typing the URL I tell them to. Even Shareview is not working for them - they can see my screen but I cannot see theirs.
So I am pleading - help! Please throw out some wacky ideas. I just learned an hour or so ago that they did not even have silverlight installed, so the mornings debugging effort was a waste of time. So who knows what the next fascinating source of problems is?
Here is the user-agent info. Oor app is .net 4.0 could that be the problem? It does noty look to my untrained eye that the client supports 4.0 (from the web server log):
Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.04506.30;+.NET+CLR+3.0.04506.648;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729) 401 3 5 0
Getting closer - I see that the GET request for the XAP is returning 401 - not found! What would do that?
It's very possible that they are blocking XAP files either directly or indirectly at the firewall. XAP files are simply ZIP files and inspection-based firewalls tend to look at these as a security risk. You may want to see if they can setup an exception in their rules.
Hmm..
1) check if your client access the good url. Request their IP address, then check the web server logs. ie: are they requesting the good url, are they downloading the xap? (check if they use a proxy, too?)
2) check that they restarted their browser after installing the Silverlight runtime.
3) Do you handle the UnhandledException event for your Application class? If no, use it to send detailed exception logs to your server on application crash.
4) Use javascriptto initialize your Silverlight application. This way, you can be notified if the runtime fail to start, (for example if it failed to load the xap file) You can use ajax to report the issue to the webserver.

Resources