Why GAE go runtime 1.11 max concurrent requests not working? - google-app-engine

Since moving to GAE go runtime 1.11, we notice the number of instances is much higher. When I dig into the problem, it seems that GAE is not running in concurrency.
Here is a very light module of the frontend settings:
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 2
min_pending_latency: 0.030s
max_pending_latency: automatic
max_concurrent_requests: 80
target_throughput_utilization: 0.95
And with about 50 requests per second, GAE spun up 2 active instances. Each has about 25 QPS and the average latency is under 20ms. Even the chart shows the instances aren't really busy.
What is in the settings that would cause this issue?
I don't think Go runtime 1.9 has this issue. And the document said it ignores the max concurrent requests setting which should make Go runtime 1.11 perform much better.

The max_pending_latency: automatic setting will make your app to scale up if your latency goes above 30ms . If in the current situation, with 2 instances, your average latency is somewhat under 20, it would have been possible that in the initial situation it went over 30 for a short period of time, which triggered the scaling. In case you do not want this to happen, you can always set the max_pending_latency manually, with a value above 30.
Regarding the comparison with Go 1.9, it is known that Go 1.11 consumes slightly more RAM and CPU power than his ancestor, so this would be something normal.
In conclusion I do not think that what is happening in your situation is an issue, but something normal. In case you do not agree you can provide me your whole, sanitized app.yaml file and I will look deeper to see if anything is wrong and edit my answer.

Related

Reasons for "Exceeded soft memory limit of 256 MiB..."

Before I get dinged for a dup, I've actually gone through many of the other posts on soft memory limit, and they never really explain what the common causes are. My question here is about what could be causing this, and whether it's just a function or if it could by my yaml settings or being slammed by bots.
Here are my logs histogram:
As you can see I'm getting an error of this type about once an hour, and some intermittent warnings, but it's not the lions share of the service. I recently learned that this has been happening since early December, and increasingly so. I figured it was just an issue of inefficient code (Python/Flask), refactored my index page, but it's still happening and not significantly diminishing even after a serious refactor:
Exceeded soft memory limit of 256 MiB with 280 MiB after servicing 956 requests total. Consider setting a larger instance class in app.yaml.
293 MiB after servicing 1317 requests
260 MiB after servicing 35 requests
The strange thing is that it's happening on pages like
/apple-touch-icon.png
that should just 404.
Here are some other things that may be causing the problem. First my app.yaml page has settings that I added before my site was as popular that are extremely lean to say the least:
# instance_class: F1 (default)
automatic_scaling:
max_instances: 3
min_pending_latency: 5s
max_pending_latency: 8s
#max_concurent_requests: 20
target_cpu_utilization: 0.75
target_throughput_utilization: 0.9
The small instances, min and max latency, and cpu utilization are all obviously set for slower service, but I'm not made of money, and the site isn't generating revenue.
Secondly, looking at the logs recently, I'm getting absolutely slammed by webcrawlers. I've added them to robots.txt:
User-Agent: MJ12bot
Crawl-Delay: 20
User-Agent: AhrefsBot
Crawl-Delay: 20
User-Agent: SemrushBot
Crawl-Delay: 20
It looks like all but Semrush have died down a bit.
Anyway, thoughts? Do I just need to upgrade to F2, or is there something in the settings that I've definitely got wrong.
Again, I've very seriously refactored the main pages that trigger the alert, but it seems not to have helped. The real issue is that I'm just a coder without a networking background, so I honestly don't even know what's happening.
In my experience, only really simple apps will fit in an F1 without periodic memory errors. I don't know if the cause is Python or GAE, but memory cleanup does not work well on Python/GAE.
Though GAE automatically restarts instances when there is a memory error so you can probably ignore it unless occasional slow responses to end users is a deal breaker for you.
I would just upgrade to F2 unless you are really on a budget.

GAE: Explain graph of instance usage

Why Active:0? I am running a lot of cron jobs and tasks queue so i think Active:1 or Active:2 .
Billed Instances Estimate: 1.00 Does it mean I only have to pay for 1 instance even though I see 2 instances running, which both have requests: 700, 62.
I see 0, 0.5, 1.0, 1.5, 2.0, 2.5 value on vertical axis. Why and how can the number of instances 0.5, 1.5, 2.5 ?
My app.yaml is
automatic_scaling:
max_idle_instances: 1
min_idle_instances: 0
max_concurrent_requests: 80
target_cpu_utilization: 0.9
min_pending_latency: 500ms
How can I set the max number of instances = 1?.
I do not want to have 2 instances because 1 instances running 24 hours will equal 24 hours (within free tier).
The graph can be confusing - similar questions popped in my head as well at the beginning. So I watched closely the graphs and the numbers from the summary page which doing tests for more than 1 month. And compared the projections I had from those observations with the actual bill I got. I concluded that the graphs aren't very precise, I trust the numbers more. I only check the graphs to get a feeling on traffic patterns, I mostly disregard their estimates for billing purposes.
Another thing I noticed is that GAE isn't actually/aggressively killing the idle instances right away, it just stops taking them into account for billing.
As for setting the max number of instances - the capability has been recently added. From Scaling elements:
max_instances
Optional. Specify a value between 0 and 2147483647, where zero
disables the setting. This parameter specifies the maximum number of
instances for App Engine to create for this module version. This is
useful to limit the costs of a module.
Important: If you use appcfg from the App Engine SDK for Python to deploy, you cannot use this parameter in your app.yaml. Instead,
set the parameter as described in Setting Autoscaling Parameters in
the API Explorer, or by using the App Engine Admin API.
If you are only using 1 instance no matter what, you might as well use manual scaling.
Sometimes App Engine would keep an instance alive (for a variety of reasons such as traffic prediction). You would not billed for idling instances that you did not provision (beyond the 15 minute shutdown time). Billable instances is often not the same as created or active instances. The graph is mostly used to monitor traffic not really designed for cost calculation (it's a line graph, it's hard to calculate cost from it without doing calculus). It's best to simply use your bill every cycle to track your actual usage.

Why are we experiencing huge latency on one autoscaled Google App Engine instance when several others are available?

Our autoscaling parameters in app.yaml are as follows:
automatic_scaling:
min_idle_instances: 3
max_idle_instances: automatic
max_pending_latency: 30ms
max_concurrent_requests: 20
The result is 3 resident instances and typically 2-6 dynamic instances (depending on traffic), but the load distribution among the instances seems inefficient. In the screenshot below we see 1 instance with the vast majority of requests, and a massive 21s latency (in last minute).
To me this indicates there must be something wrong with our setup to explain these high latencies.
Has anyone experienced issues like this with GCP or App Engine?
Idle instances aren't used to balance current load. They bridge the gap while new dynamic instances are spinning up. In your setup it might be worth trying just one or two idle instances and fiddle with min and max pending latency.
Pending latency is measured by how long a request stays in the queue before it is handled by an instance. The latency you see in your screenshot is the time between request and response. If any single request takes 21 seconds it would look like this. The pending latency could still be below 30ms though.
You should check your logs and see which request takes so long and probably break them up into smaller chunks of work. Many small jobs scale much better than huge jobs. Pending latency will also go up with lots of small jobs and will cause your app to scale properly.

How to limit instance spawning (autoscaling) on GAE?

I have noticed a recent surge in instance spawning on GAE.
In my app.yaml I have clearly defined that max two instances should be created at a time.
application: xxx
version: 1-6-0
runtime: python27
api_version: 1
instance_class: F2
automatic_scaling:
max_idle_instances: 2
threadsafe: true
However the dashboard is showing 4 instances and the bills are going up. How can I stop this madness? :)
I did a lot of research into this.
F instances are automatically scaled. There is no way to limit that. Hence it makes sense to move the actual work away from frontend instances and put that into a backend instance (B1 or B2). The latter provides another 8 hours free quota.
The real challenge is to re-architect the app to use a default app.yaml for web statics, a mobile.yaml for mobile requests with shorter min_pending_latency and a backend.yaml (B2) instance for handling the tasks and calculations.
All this needs to be routed properly via a dispatch.yaml. In this file you can specify which url endpoint will be handled effectively by which module.
Best way to understand it, is to look at this excellent example from GAE:
It makes sense trying to make it work first on local environment, before trying anything on remote server.
dev_appserver.py dispatch.yaml app.yaml mobile.yaml backend.yaml
Also this official documentation explains some of the above in more detail.
Its pretty impressive, what can be achieved with GAE.
max_idle_instances
The maximum number of IDLE instances that App Engine should maintain
for this version.
It seems that it is not currently possible to set the max number of instances for automatic scaling module. As #DoIT suggests, you can set spending limit, however, keep in mind the below.
When an application exceeds its daily spending limit, any operation
whose free quota has been exhausted fails.
So if you need to control somehow the total number of instances and keep your service running, I see the following possibilities.
Change your scaling type to basic and set max_instances parameter as you like
Keep automatic scaling type and increase min_pending_latency and max_concurrent_requests parameters (multi-threading has to be enabled)
You can find more details here.
The '''max_idle_instances''' set the max number of idle instances, i.e. Instances that are waiting for traffic spike. From your screenshot, it looks like all instances are getting traffic, so it looks ok to me. You can set max daily budget, if you want to control your spend on GAE.
its possible that some of your requests are taking too long to complete, and thus cause new instances to be spawned. You could work around this (I'm told, but have yet to try it myself) by setting a high value for your min_pending_latency property. this could hurt your latency a little, but would also limit the rate of instance-spawning.

How to migrate gae app to new config?

I'm updating my webapp that previously has been using the default configuration. Now I'm starting to use modules trying this config
application: newkoolproject
# Other settings here...
version: newkool
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
min_idle_instances: 5
max_idle_instances: automatic # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
max_concurrent_requests: 50
I pay for instance hours, data reads and complex searches totalling a few dollar daily on my current budget (where my limit is 7 USD for a day to avoid sudden spikes due to DoSing or other technical issue).
Could it be feasible for my to try and squeeze my app into the freetier using memcache and other technologies to reduce costs? Or should I forget about reaching free-tier (< 28 instances hours etc) and instead make a configuration for a more optimal UX? How will the change change my costs?
Update 141010 18:59 CET
I could add appstats
You'll need to turn appestats on on your local app engine development server, go through your typical user flow. Here are the instructions on how to do this: https://cloud.google.com/appengine/docs/python/tools/appstats
Make sure you turn calculate RPC costs on:
appstats_CALC_RPC_COSTS = True
Once you go through a typical user flow, you would go to localhost:8080/_ah/stats and it will estimate how much specific calls and flows will cost when you get to production. Really great tool, as it even helps identify bottlenecks and slow running areas within your application.
Google's recommendation is to not only use memcache, but also to split work into smaller units (as much as possible) by leveraging task queues.
UPDATE: Simple memcache usage example
my_results = memcache.get("SOME-KEY-FOR-THIS-ITEM")
if not my_results:
#Do work here
memcache.set("SOME-KEY-FOR-THIS-ITEM", my_results)
return my_results

Resources