App Engine loading request even when idle instance available - google-app-engine

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.

Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.

Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Related

When does Google App Engine start or stop an instance?

We have an App Engine app that handles an average .5 requests per second, and seemingly all those requests can be handled by the same instance running a Go app as the main version.
However, sometimes App Engine kicks off a second instance (and sometimes even a third one), that doesn't seem to do anything past handling one or two requests. Here's an example.
Shutting down that instance manually doesn't seem to cause any harm, so my question is, why does App Engine not kill the instance after it did not get any requests for a while? (The above example had four requests in the past hour, often the requests/age ratio gets even lower).
Update:
A similar situation is when an instance is started on a different version. App Engine only seems to kill the instance after hours of not getting any requests.
Under Application Settings → Performance,
Idle Instances is set to Automatic – 20
Pending Latency is set to 150ms – 250ms
I wish I knew what controls if/when it kills idle instances, but I can't see any documentation of it.
To avoid excess instances starting, I think the main thing you can do here is increase the pending latency:
The Pending Latency slider controls how long requests spend in the pending queue before being served by an Instance of the default version of your application. If the minimum pending latency is high App Engine will allow requests to wait rather than start new Instances to process them. This can reduce the number of instance hours your application uses, but can result in more user-visible latency.
Even if you only average 4 requests/hour, if you happen to get two closely spaced I suppose it's possible it would start a new instance.
You can also see some small amount of information in the logs about why it started a new instance.
The "How Applications Scale" section of the Google App Engine documentation states:
Scaling in Instances
Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load.
App Engine also scales instances in reverse when request volumes decrease. This scaling helps ensure that all of your application's current instances are being used to optimal efficiency and cost effectiveness.
It also states you can "specify a minimum number of idle instances", and to "optimize for high performance or low cost" in the administration console.
Try setting the "Idle instances" field to something like 3 - 5, and "optimize for low cost" and see if that affects the instance kill time.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

GAE: Why do I experience loading requests even though I have fixed the number of instances to exactly one?

I have a low-load application which experienced latency spikes (requests taking up to 10s to return) due to loading requests, as seen in the logs:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
Here I assume that "new process" means "new instance".
In order to avoid this, I fixed the number of idle instances to exactly one (max=1 and min=1), so there is always one instance running ("resident instance") and GAE shouldn't start new ones. Billing is enabled.
However, I still experience loading requests. Why? Can anything be done about this?
Idle instances are "reserve" instances - they are meant to handle spikes when traffic increases, not the "normal" traffic. Idle instances are used only during the spin-up of the dynamic instances.
So, when you have one idle instance and no dynamic instances running and you get a request, than the idle instance should handle the request, but a new dynamic instance will still be spun up.
I too experienced the same problem with my low-traffic app and here is the practical solution that almost always prevents my users to face a cold start :
- 1 resident F4 instance
- pending latency to 15 sec
- i worked so that my warmup request are as fast as possible (under 10 sec), still quite long cause i use the frameWork Play (Java)
- and when i really don t want to have any problems i create fake traffic by pinging my app.
With this config, the resident usually serves around 50 requests, during that time, a dynamic instance receives a warmup and then start serving.

Google AppEngine sending all requests to same instance

Lately, I have seen GAE taking much, much longer to process requests than it did just a week ago. Nothing changed in my code, but GAE now is taking 4000-12000ms to respond to requests. What makes is worse is that I have plenty of instances available with 0 requests on them.
Has anyone else seen this happen?
What can I do to fix it?I have gone as far as to spin up 15 extra instances (and paid through the nose for them) but nothing seems to send requests to the other idle instances reliably.
My bill has gone from 70-90c/day to $5-8/day without any code change or increase in traffic. In fact, I am losing traffic because of the huge latency.
QPS* Latency* Requests Errors Age Memory Availability
0.000 0.0 ms 1378 0 10:10:09 57.9 MBytes Dynamic
0.000 0.0 ms 1681 0 15:39:57 57.2 MBytes Dynamic
0.017 9687.0 ms 886 0 10:19:10 56.7 MBytes Dynamic
I recommend installing AppStats to get a picture of what's taking so long in each request. I'd guess that you're having some contention issues or large numbers of reads/writes caused by some new data configuration.
The idle instances won't help decrease latency - it looks like every request takes a long time, and with less than one request per minute (in this sample anyway), 10s requests could run serially on the same instance.
We have a similar problem in our app. In our case, we are under the impression that GAE's scheduler did a poor job in balancing requests to existing instances.
In some cases, the scheduler decided to spin up new instances instead of re-using already existing ones. Since spinning a new instance took 5 to >45 seconds, I suspect this might be what happened to you.
Try to investigate the following and see if it helps you:
Make sure your app has thread-safe enabled so that you could process concurrent requests. You could configure this in your app.yaml if you are using Python, or in your appengine-web.xml if you use Java. Of course, you also need to make sure that the code in your app is threadsafe.
In your application settings, if it is still set on automatic, change the minimum pending latency to a non-automatic setting. I'd suggest around 10 seconds for now, but you could experiment later on which setting would suit you the most. This force the scheduler to wait for a certain time to see if any instance is available within the time before spinning up a new instance.
Now, to answer your original question regarding sending all requests to same instance, as far as I know there is no way to address a specific front-end instance in order to direct the requests to that particular instance.
What you could do is migrate your app to use backend instances instead of the regular frontend instance. Backends provides a way to directly target any particular instance within it. You could deploy your app in a single backend to have more control on the number of instance that you spawn. And since using the backend bypass the scheduler, you would not encounter latencies caused by new instances spinning up.
The major drawback of using this approach is that you lose the auto-scalability benefit of using front-end instances. But seeing from your low daily billing, I think scalability is not yet a major concern for the scale of your app.

Google App Engine - Request was aborted after waiting too long to attempt to service your request

I get this error sometimes.
Request was aborted after waiting too
long to attempt to service your
request. Most likely, this indicates
that you have reached your
simultaneous dynamic request limit.
This is almost always due to
excessively high latency in your app.
Please see
http://code.google.com/appengine/docs/quotas.html
for more details.
The request that causes it has 10 seconds of latency and 0ms of cpu time. It is a simple jsp page that doesn't do anything that takes long at all. Also, my app is very low traffic, and all the times it has happened, it is the only request being processed.
What causes this?
If your application is low-traffic, it's possibly the startup time. There seems to be an ongoing issue where it takes so long to start an instance up, that they breach the time limit.
Some people have "worked around" this by having a cron/scheduled request that runs every few minutes that does nothing (though personally I think is counter-productive, somewhat undermining the reason Google spin your app down!).
There was an issue in their bugtracker about this:
http://code.google.com/p/googleappengine/issues/detail?id=2456
It's now marked as fixed for version 1.4, and there's a little info on it here:
http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html
Always On - For high-priority applications with low or variable traffic, you can now reserve instances via App Engine's Always On feature. Always On is a premium feature costing $9 per month which reserves three instances of your application, never turning them off, even if the application has no traffic. This mitigates the impact of loading requests on applications that have small or variable amounts of traffic.

Resources