Why are we experiencing huge latency on one autoscaled Google App Engine instance when several others are available? - google-app-engine

Our autoscaling parameters in app.yaml are as follows:
automatic_scaling:
min_idle_instances: 3
max_idle_instances: automatic
max_pending_latency: 30ms
max_concurrent_requests: 20
The result is 3 resident instances and typically 2-6 dynamic instances (depending on traffic), but the load distribution among the instances seems inefficient. In the screenshot below we see 1 instance with the vast majority of requests, and a massive 21s latency (in last minute).
To me this indicates there must be something wrong with our setup to explain these high latencies.
Has anyone experienced issues like this with GCP or App Engine?

Idle instances aren't used to balance current load. They bridge the gap while new dynamic instances are spinning up. In your setup it might be worth trying just one or two idle instances and fiddle with min and max pending latency.
Pending latency is measured by how long a request stays in the queue before it is handled by an instance. The latency you see in your screenshot is the time between request and response. If any single request takes 21 seconds it would look like this. The pending latency could still be below 30ms though.
You should check your logs and see which request takes so long and probably break them up into smaller chunks of work. Many small jobs scale much better than huge jobs. Pending latency will also go up with lots of small jobs and will cause your app to scale properly.

Related

Exceeded soft memory limit of 243 MB with 307 MB after servicing 4330 requests total. Consider setting a larger instance class in app.yaml

Situation:
My project are mostly automated tasks.
My GAE (standard environment) app has 40 crons job like this, all run on default module (frontend):
- description: My cron job Nth
url: /mycronjob_n/ ###### Please note n is the nth cron job.
schedule: every 1 minutes
Each of cron jobs
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
deferred.defer(mytask,pram)
Where mytask is
def mytask(pram):
#Do some loops, read and write datastore, call api, which I guesss taking less than 30 seconds.
return 'Task finish'
Problem:
As title of the question, i am running out of RAM. Frontend instance hours are increasing to 100 hours.
My wrong thought?
defer task runs on background because it is not something that user sends request when visit the website. Therefore, they will not be considered as a request.
I break my cronjobs_n into small different tasks because i think it can help to reduce the running time each cronjobs_n so that REDUCE instance's ram consumption.
My question: (purpose: keep the frontend/backend instance hours as low as possible, and I accept latency)
Is defer task counted as request?
How many request do I have in 1 mintues?
40 request of mycronjob_n
or
40 requests of mycronjob_n x 100 mytask = 4000
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10 to 20 F1 instances more and then shut down if idle? I set autoscale in app.yaml. I dont see the meaning of autoscale of GAE here as advertised.
What is the best way to optimize my app?
If defer task is counted as request, it is meaningless to slit mycronjob_n into different small tasks, right? I mean, my current method is as same as:
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
mytask(pram) #Call function mytask
Here, will my app has 40 requests per minute, each request runs for 100 x 30s = 3000s? So will this approach also return out of memory?
Should I create a backend service running on F1 instance and put all cron jobs on that backend service? I heard that a request can run for 24 hours.
If I change default service instance from F1 to F2,F3, will I still get 28 hours free? I heard free tier apply to F1 only. And will my backend service get 9 hours free if it runs on B2 instead of B1?
My regret:
- I am quite regret that I choose GAE for this project. I choosed it because it has free tier. But I realized that free tier is just for hobby/testing purpose. If I run a real app, the cost will increase very fast that it make me think GAE is expensive. The datastore reading/writing are so expensive even though I tried my best to optimize them. The frontend hours are also always high. I am paying 40 usd per month for GAE. With 40 usd per month, maybe I can get better server if I choose Heroku, Digital Ocean? Do you think so?
Yes, task queue requests (deferred included) are also requests, they just can run longer than user requests. And they need instances to serve them, which count as instance hours. Since you have at least one cron job running every minute - you won't have any 15 minute idle interval allowing your instances to shut down - so you'll need at least one instance running at all times. If you use any instance class other than F1/B1 - you'll exceed the free instance hours quota. See Standard environment instances billing.
You seem to be under the impression that the number of requests is what's driving your costs up. It's not, at least not directly. The culprit is most likely the number of instances running.
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10
to 20 F1 instances more and then shut down if idle?
Most likely GAE does exactly that - spawns several instances. But you keep pumping requests every minute, they don't reach an idle state long enough, so they don't shut down. Which drives your instance hours up.
There are 2 things you can do about it:
stagger your deferred tasks so they don't hit need to be handled at the same time. Fewer instance (maybe even a single one?) may be necessary to handle them in such case. See Combine cron jobs to reduce number of instances and Preventing Google App Engine Cron jobs from creating multiple instances (and thus burning through all my instance hours)
tune your app's scaling configuration (the range is limited though). See Scaling elements.
You should also carefully read How Instances are Managed.
Yes, you only pay for exceeds the free quota, regardless of the instance class. Billing is in F1/B1 units anyways - from the above billing link:
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will
see the appropriate multiple of instance hours. For example, if you
use an F4 instance for one hour, you do not see "F4" listed, but you
see billing for four instance hours at the F1 rate.
About the RAM usage, splitting the cron job in multiple tasks isn't necessarily helping, see App Engine Deferred: Tracking Down Memory Leaks
Finally, cost comparing GAE with Heroku, Digital Ocean isn't an apples-to-apples comparison: GAE is PaaS, not IaaS, it's IMHO expected to be more expensive. Choosing one or the other is really up to you.

Preparing for a flash crowd on Google App Engine

I recently experienced a sharp, short-lived increase in the load of my service on Google App Engine. The load went from ~1-2 req/second to about 10 req/second for about a couple of hours. My number of dynamic instances scaled up pretty quickly but in the process I did get a number of "Request waited too long" timeout messages.
So the next time around, I would like to be prepared with enough idle instances to handle my load. But now the question is, how do I determine how many is adequate. I expect a much larger burst in load this time - from practically nothing to an average of 500 requests/second, possibly with a peak of 3000. This is to last between 15 minutes and 1 hour.
My main goal is to ensure that the information passed via HTTP Post is saved to the datastore by means of a single write.
Here are the steps I have taken to prepare for the burst:
I have pruned the fast path to disable analytics and other reporting, which typically generate 2 urlfetch requests.
The datastore write is to be deferred to a taskqueue via the deferred library
What I would like to know is:
1. Tips/insights into calculating how many idle instances one would need per N requests/second.
2. It seems that the maximum throughput of a task queue is 500/second. Is this the rate at which you can push tasks, and if not, then is there a cap on that? I'm guessing not, since these are probably just datastore writes, but I would like to be sure.
My fallback plan if I am not confident of saving all of the information for this flash mob is to set up a beefy Amazon EC2 instance, run a web server on it and make my clients send a backup request to this server.
You must understand that Idle Instances are only used when new frontend instances are being spun-up. This means that they are only used during traffic increases. When traffic is steady they are not used.
Now if your instance needs 20 sec to spin up and can handle 10 req/sec of steady traffic and you traffic INCREASE is 5 req/sec, then you'll need 20 * 5 / 10 = 10 idle instances if you don't want any requests dropped.
What you should do is:
Maximize instance throughput (number of requests it can handle): optimize code, use async db operations and enable Concurrent Requests.
Minimize your instance startup time. This is important because idle instances are used during spinning up of new instances and the time it takes to spin up a new instance directly relates to how many idle instances you need. If you use Java this means getting rid of any heavy frameworks that do classpath scanning (Spring, etc..).
Fourth, number of frontend instances needed is VERY application specific. But since you already had traffic increase you should know how many requests your frontend instance can handle per second.
Edit: There is one more obvious thing you should do: HTTP caching. GAE has a transparent HTTP cache which can be simply controlled via Cache-Control headers.
Also, if analytics has a big performance impact on your server, consider using client side analytics services (like Google Analytics). They also work for devices.

Handling AppEngine outage with pending_ms during several minutes

Today AppEngine went down for a while:
http://code.google.com/status/appengine/detail/serving/2012/10/26#ae-trust-detail-helloworld-get-latency
The result was that all requests were kept as pending, for some for as long as 24 minutes. Here is an excerpt from my server log. These requests are in general handled in less than 200 ms.
https://www.evernote.com/shard/s8/sh/ad3b58bf-9338-4cf7-aa35-a255d96aebbc/4b90815ba1c8cd2080b157a54d714ae0
My quota (8$ per day) was exploded in a matter of minutes when it previously was at around 2$ per day.
How can I prevent pending_ms to eat all my quota, even though my actual request is still responding very fast? I had the pending delay from 300 ms to Automatic. Does limiting the maximum to 10 seconds prevent that type of outbreak?
blackjack75,
You're right, raising the pending latency to something like 10 seconds will help reduce the number of instances started.
It looks like the long running requests tied up your instances. When this happens, app engine spins up new instances to handle the new requests, and of course instances cost money.
Lowering your min and max idle instances to smaller numbers should also help.
On your dashboard, you can look at your instance graph you to see how long the burst of instances was left idle after the request load was finished.
You can look at your typical usage to help estimate a safe max.
Lowering them can cause slowness when legitimate traffic needs to spin up a new instance, especially with bursty traffic, so you would want to adjust this to match your budget. For comparision, on a non-production appspot having the min and max set to 1 works fine.
Besides that, general techniques for reducing app engine resource usage will help. It sounds like you've gone through that already since your typical request time is low. Enabling concurrent requests could help here if your code will handle threads correctly (no globals, etc.) and your instances have enough free memory handle multiple requests.

App Engine loading request even when idle instance available

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.
Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.
Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Google AppEngine sending all requests to same instance

Lately, I have seen GAE taking much, much longer to process requests than it did just a week ago. Nothing changed in my code, but GAE now is taking 4000-12000ms to respond to requests. What makes is worse is that I have plenty of instances available with 0 requests on them.
Has anyone else seen this happen?
What can I do to fix it?I have gone as far as to spin up 15 extra instances (and paid through the nose for them) but nothing seems to send requests to the other idle instances reliably.
My bill has gone from 70-90c/day to $5-8/day without any code change or increase in traffic. In fact, I am losing traffic because of the huge latency.
QPS* Latency* Requests Errors Age Memory Availability
0.000 0.0 ms 1378 0 10:10:09 57.9 MBytes Dynamic
0.000 0.0 ms 1681 0 15:39:57 57.2 MBytes Dynamic
0.017 9687.0 ms 886 0 10:19:10 56.7 MBytes Dynamic
I recommend installing AppStats to get a picture of what's taking so long in each request. I'd guess that you're having some contention issues or large numbers of reads/writes caused by some new data configuration.
The idle instances won't help decrease latency - it looks like every request takes a long time, and with less than one request per minute (in this sample anyway), 10s requests could run serially on the same instance.
We have a similar problem in our app. In our case, we are under the impression that GAE's scheduler did a poor job in balancing requests to existing instances.
In some cases, the scheduler decided to spin up new instances instead of re-using already existing ones. Since spinning a new instance took 5 to >45 seconds, I suspect this might be what happened to you.
Try to investigate the following and see if it helps you:
Make sure your app has thread-safe enabled so that you could process concurrent requests. You could configure this in your app.yaml if you are using Python, or in your appengine-web.xml if you use Java. Of course, you also need to make sure that the code in your app is threadsafe.
In your application settings, if it is still set on automatic, change the minimum pending latency to a non-automatic setting. I'd suggest around 10 seconds for now, but you could experiment later on which setting would suit you the most. This force the scheduler to wait for a certain time to see if any instance is available within the time before spinning up a new instance.
Now, to answer your original question regarding sending all requests to same instance, as far as I know there is no way to address a specific front-end instance in order to direct the requests to that particular instance.
What you could do is migrate your app to use backend instances instead of the regular frontend instance. Backends provides a way to directly target any particular instance within it. You could deploy your app in a single backend to have more control on the number of instance that you spawn. And since using the backend bypass the scheduler, you would not encounter latencies caused by new instances spinning up.
The major drawback of using this approach is that you lose the auto-scalability benefit of using front-end instances. But seeing from your low daily billing, I think scalability is not yet a major concern for the scale of your app.

Resources