Google App Engine - Sudden increase in latency - google-app-engine

We have seen a sudden latency increase in our application on Google App Engine latency within the past few hours. The logs show that requests fail with message "Request was aborted after waiting too long to attempt to service your request.", with no stack-trace or any other relevant information. Users get an empty page with message "Rate exceeded.". No changes have been done to the application that correlate to this spike in latency.
The application is therefore down, with no information from app engine that can help point to the source of the latency.
We have filed a issue in the issuer tracker, no luck in getting response yet.
Does anyone have ideas on what we could do to deal with this kind of situation?
Update
The problem went away after 3 hours as suddenly as it came, and without any intervention on our part. Since there is consensus on min_idle_instances, we have decided to leave all the setting as they have always been so that we can see if this ever happens again. If it does happen, we will have an opportunity to test this by making the suggested changes, and post an update here.
Here is a screen shot for the entire incident:

The comment that #Parth Mehta added is useful and it made me think of what could be causing your issues.
I'm thinking that maybe your increased latency is due to not having idle instances ready for the requests as they increase and come in, so when requests increase a bit is taken until the new instances are ready and there might be your latency cause.
Setting enough min_idle_instances might alleviate the 500's as they would be warm and ready for the requests.
If this doesn't solve your issue I would recommend creating a case with GCP Support and we will surely be able to assist you more.
Try it and let us know.

Related

GAE Occasionally stops serving for 1-5 minutes

Starting about 1 week ago, my app will occasionally and randomly completely stop serving for 1-5 minutes. Requests during this time hang for the full timeout and then return a 500.
The System Status dashboard reads OK, I have no cron jobs or anything special that might cause this disruption (that I know of).
Has anyone experienced this, and is there a solution?
If you have 'threadsafe: false' in your app.yaml configuration, App Engine will not send concurrent requests to your app. If you have a request that's blocking for a really long time, all other requests coming in will line up (and possibly time out) before being serviced. If this is the cause of your problem, either make your app thread-safe or have a look in your logs to find requests that take a long time and fix them.
Alternatively, if your app gets very little traffic, your instances might be getting shut down after they've been idle for a while. If your app takes a long time to start up, that would explain the behavior you're seeing. In app.yaml, you can set 'min_idle_instances' to some value greater than zero to avoid this startup penalty.

Appengine responses becoming slower?

my ajax calls to AppEngine doing some very basic logic (and doing all the actual processing in the background, isolated from the frontend) tend to be at least 200% slower than they used to. Like taking 3 seconds instead of one out of a sudden since a week or so.
I am wondering if you guys had a similar experience or something changed in the meantime I am not aware of, quota wise maybe. I am using the free quota.
Thanks
Zac
To my knowledge there is no particular change going on, but we can't be sure. However slow response time can have multiple root causes.
If you have no traffic on your application then you might have zero instance running, therefore when you make your request there is the time for an instance to start up.
If you have a lot of traffic, depending on your configuration the request can take more time. You need to fine tune wether the request waits to be handled by an "overloaded" instance or if another instance should start.
If you use an API maybe there is something wrong with it.
I would suggest you enable appstats in your app, it will show you what takes time in your request: you will definitely see if this is something on your side or not.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

App Engine loading request even when idle instance available

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.
Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.
Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Google App Engine - Request was aborted after waiting too long to attempt to service your request

I get this error sometimes.
Request was aborted after waiting too
long to attempt to service your
request. Most likely, this indicates
that you have reached your
simultaneous dynamic request limit.
This is almost always due to
excessively high latency in your app.
Please see
http://code.google.com/appengine/docs/quotas.html
for more details.
The request that causes it has 10 seconds of latency and 0ms of cpu time. It is a simple jsp page that doesn't do anything that takes long at all. Also, my app is very low traffic, and all the times it has happened, it is the only request being processed.
What causes this?
If your application is low-traffic, it's possibly the startup time. There seems to be an ongoing issue where it takes so long to start an instance up, that they breach the time limit.
Some people have "worked around" this by having a cron/scheduled request that runs every few minutes that does nothing (though personally I think is counter-productive, somewhat undermining the reason Google spin your app down!).
There was an issue in their bugtracker about this:
http://code.google.com/p/googleappengine/issues/detail?id=2456
It's now marked as fixed for version 1.4, and there's a little info on it here:
http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html
Always On - For high-priority applications with low or variable traffic, you can now reserve instances via App Engine's Always On feature. Always On is a premium feature costing $9 per month which reserves three instances of your application, never turning them off, even if the application has no traffic. This mitigates the impact of loading requests on applications that have small or variable amounts of traffic.

Resources