App engine flexible - auto scaling properly - google-app-engine

My problem: My application cannot handle multiple (order of 100s) requests.
How can I set up a cost effective auto scaling environment?
I've read the documentation but it is not clear to me how can I set up the proper environment to handle my situation.
Can I estimate the number of instances, cpu_utilization, among other parameters, accordingly to my expected requests?
My current yaml (default) is:
runtime: nodejs
env: flex
manual_scaling:
instances: 1
resources:
cpu: 0.5
memory_gb: 2.5
disk_size_gb: 10

You can estimate the resources either empirically, or you can refer to existing benchmarks, but the CPU load for example depends a lot of the processing that you are performing for each request that you receive.
This is an example that you can start building from.
You may start with smaller number of max_num_instances and fine tune the values as you go:
automatic_scaling:
min_num_instances: 1
max_num_instances: 15
cool_down_period_sec: 180
cpu_utilization:
target_utilization: 0.6
target_concurrent_requests: 100
At any moment, at least one instance will be up and running for your app.
No matter how much the load will grow, there will be no more than 15 instances of the app.
The cool_down_period_sec will not allow two subsequent scaling operations to be performed within less that 180 seconds.
The target_utilization and target_concurrent_requests represent thresholds that can trigger a scaling operation.

Related

Why would GAE not distribute the load to the other available instances?

First thing first, here is my app.yaml:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: /.*
script: auto
automatic_scaling:
min_instances: 1
max_instances: 20
inbound_services:
- warmup
I'm using Apache Benchmark for this:
ab -c30 -n100000 "${URL}"
What I notice in the GAE console is that I have 8 instances available but only 3 take on 99% of the work. The rest is serving either no request or a very small portion.
Any idea what the problem could be here?
I would recommend to use the “max_concurrent_requests” element in your “app.yaml” file, as this element is the number of concurrent requests an automatic scaling instance can accept before scheduler spawns a new instance (Keep in mind that the maximum limit is 80).
Furthermore, you can also set “max_pending_latency” that specifies the maximum amount of time that App Engine should allow a request to wait in the pending queue before starting additional instances to handle requests, so that pending latency is reduced.
If you reach the limit, it will be a signal to scale up, so the number of instances will be increased.
The fact that the load is not evenly distributed across the running instances is normal and actually desired as long as the number of instances still processing requests is sufficient to handle the current load level with a satisfactory level of performance - this allows the other instances to be idle long enough to be automatically shutdown (due to inactivity).
This is part of the dynamic instance management logic used with automatic and basic scheduling.

Why GAE go runtime 1.11 max concurrent requests not working?

Since moving to GAE go runtime 1.11, we notice the number of instances is much higher. When I dig into the problem, it seems that GAE is not running in concurrency.
Here is a very light module of the frontend settings:
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 2
min_pending_latency: 0.030s
max_pending_latency: automatic
max_concurrent_requests: 80
target_throughput_utilization: 0.95
And with about 50 requests per second, GAE spun up 2 active instances. Each has about 25 QPS and the average latency is under 20ms. Even the chart shows the instances aren't really busy.
What is in the settings that would cause this issue?
I don't think Go runtime 1.9 has this issue. And the document said it ignores the max concurrent requests setting which should make Go runtime 1.11 perform much better.
The max_pending_latency: automatic setting will make your app to scale up if your latency goes above 30ms . If in the current situation, with 2 instances, your average latency is somewhat under 20, it would have been possible that in the initial situation it went over 30 for a short period of time, which triggered the scaling. In case you do not want this to happen, you can always set the max_pending_latency manually, with a value above 30.
Regarding the comparison with Go 1.9, it is known that Go 1.11 consumes slightly more RAM and CPU power than his ancestor, so this would be something normal.
In conclusion I do not think that what is happening in your situation is an issue, but something normal. In case you do not agree you can provide me your whole, sanitized app.yaml file and I will look deeper to see if anything is wrong and edit my answer.

Initial requests to datastore and cloud tasks have higher latency, is that normal?

My app engine service is written in Go. I have code that connects to Cloud Datastore before even the server listens on the port. There is a single Projection query that takes about 500ms reading just 4 entities. Does the first interaction with datastore have higher latency potentially as a connection needs to be established? Any way this datastore connection latency be reduced? Also, is there any difference in doing this db call before listening to the port vs doing it within the warmup request (this is an autoscaled instance).
Similar to high initial latency for Cloud Datastore, I see a similar pattern for Cloud Tasks. Initial task creation could be as high as 500ms but even subsequent ones are any where from 200 to 400ms. This is in us-central. I was actually considering moving a db update to a background task but in general I am seeing the latency of task creation to be more or less same as doing a transaction to read and update the data giving no net benefit.
Finally, instance startup time is typically 2.5 to 3 seconds with the main getting called after about 2 seconds. My app startup time is the above mentioned project query cost of 500ms and nothing else. So, no matter how much I optimize my app startup, should I assume an additional latency of about 2 seconds?
Note that the load on the system is very light so these issues can't be because of high volume.
Update: deployment files as requested by Miguel (this is for a test environment investigating performance characteristics. Prod deployment will be more generous for instances)
default app:
service: default
runtime: go112
instance_class: F1
automatic_scaling:
min_instances: 0
max_instances: 1
min_idle_instances: 1
max_idle_instances: 1
min_pending_latency: 200ms
max_pending_latency: 500ms
max_concurrent_requests: 10
target_cpu_utilization: 0.9
target_throughput_utilization: 0.9
inbound_services:
- warmup
backend app:
service: backend-services
runtime: go112
instance_class: B1
basic_scaling:
idle_timeout: 1m
max_instances: 1
200-500ms to initialize a client seems reasonable because there is a remote connection being established. Also, a 1-2 seconds cold start for App Engine also seems normal.
As you mentioned, you can experiment with a warmup request to reduce cold starts and initialize clients.
I would also recommend looking into the mode you are running your Datastore in (native vs datastore). There is increase latency when using datastore mode, for more info see Cloud Datastore Best Practices.

GAE: Explain graph of instance usage

Why Active:0? I am running a lot of cron jobs and tasks queue so i think Active:1 or Active:2 .
Billed Instances Estimate: 1.00 Does it mean I only have to pay for 1 instance even though I see 2 instances running, which both have requests: 700, 62.
I see 0, 0.5, 1.0, 1.5, 2.0, 2.5 value on vertical axis. Why and how can the number of instances 0.5, 1.5, 2.5 ?
My app.yaml is
automatic_scaling:
max_idle_instances: 1
min_idle_instances: 0
max_concurrent_requests: 80
target_cpu_utilization: 0.9
min_pending_latency: 500ms
How can I set the max number of instances = 1?.
I do not want to have 2 instances because 1 instances running 24 hours will equal 24 hours (within free tier).
The graph can be confusing - similar questions popped in my head as well at the beginning. So I watched closely the graphs and the numbers from the summary page which doing tests for more than 1 month. And compared the projections I had from those observations with the actual bill I got. I concluded that the graphs aren't very precise, I trust the numbers more. I only check the graphs to get a feeling on traffic patterns, I mostly disregard their estimates for billing purposes.
Another thing I noticed is that GAE isn't actually/aggressively killing the idle instances right away, it just stops taking them into account for billing.
As for setting the max number of instances - the capability has been recently added. From Scaling elements:
max_instances
Optional. Specify a value between 0 and 2147483647, where zero
disables the setting. This parameter specifies the maximum number of
instances for App Engine to create for this module version. This is
useful to limit the costs of a module.
Important: If you use appcfg from the App Engine SDK for Python to deploy, you cannot use this parameter in your app.yaml. Instead,
set the parameter as described in Setting Autoscaling Parameters in
the API Explorer, or by using the App Engine Admin API.
If you are only using 1 instance no matter what, you might as well use manual scaling.
Sometimes App Engine would keep an instance alive (for a variety of reasons such as traffic prediction). You would not billed for idling instances that you did not provision (beyond the 15 minute shutdown time). Billable instances is often not the same as created or active instances. The graph is mostly used to monitor traffic not really designed for cost calculation (it's a line graph, it's hard to calculate cost from it without doing calculus). It's best to simply use your bill every cycle to track your actual usage.

How to migrate gae app to new config?

I'm updating my webapp that previously has been using the default configuration. Now I'm starting to use modules trying this config
application: newkoolproject
# Other settings here...
version: newkool
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
min_idle_instances: 5
max_idle_instances: automatic # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
max_concurrent_requests: 50
I pay for instance hours, data reads and complex searches totalling a few dollar daily on my current budget (where my limit is 7 USD for a day to avoid sudden spikes due to DoSing or other technical issue).
Could it be feasible for my to try and squeeze my app into the freetier using memcache and other technologies to reduce costs? Or should I forget about reaching free-tier (< 28 instances hours etc) and instead make a configuration for a more optimal UX? How will the change change my costs?
Update 141010 18:59 CET
I could add appstats
You'll need to turn appestats on on your local app engine development server, go through your typical user flow. Here are the instructions on how to do this: https://cloud.google.com/appengine/docs/python/tools/appstats
Make sure you turn calculate RPC costs on:
appstats_CALC_RPC_COSTS = True
Once you go through a typical user flow, you would go to localhost:8080/_ah/stats and it will estimate how much specific calls and flows will cost when you get to production. Really great tool, as it even helps identify bottlenecks and slow running areas within your application.
Google's recommendation is to not only use memcache, but also to split work into smaller units (as much as possible) by leveraging task queues.
UPDATE: Simple memcache usage example
my_results = memcache.get("SOME-KEY-FOR-THIS-ITEM")
if not my_results:
#Do work here
memcache.set("SOME-KEY-FOR-THIS-ITEM", my_results)
return my_results

Resources