Set the minimum number of instances - google-app-engine

I understand that App Engine handles scaling automatically. However, in order to test drive some multi-instance / consolidated state scenarios, I'd like to instruct App Engine to fire up a minimum of 5 instances, even if load does not justify this.
Is there a way of doing this via app.yaml or Dashboard?

Try setting the Min Pending Latency to a very low number (e.g. 100ms), and then send a burst of requests to your app. Then the scheduler will start spinning up multiple instances to handle these requests.
You may need to use a tool for automated load testing - it will be difficult to achieve this manually.

This is what the min_idle_instances value controls.
In app.yaml:
automatic_scaling:
min_idle_instances: 5

It is possible to use either automatic_scaling or manual_scaling. With automatic scaling, you can only influence how many instances are started, with manual scaling, you can decide how many instances you wish to use. Place the following in your app.yaml, or module yaml file:
manual_scaling:
instances: 5

Related

CloudTasks not scaling GAE Backend instances

I was searching for a solution which would allow me to run tasks up to 24h long. The combination of Cloud Tasks and multiple AppEngine Backend instances seemed like a perfect way to go.
As the tasks are long running, I would like to scale to max_instances as fast as possible. But I am having trouble to do so.
Here is my app.yaml
service: slow
runtime: python37
# --timeout=90000 (25h) -> AppEngine Backend Instance should raise TimeoutExceededError after 24h
entrypoint: gunicorn main:app --workers 1 --timeout=90000
instance_class: B2
basic_scaling:
max_instances: 15
Here is a printscreen of my Cloud Task Queue configuration.
My issue is that the tasks from Cloud Task Queue are not spawning new instances as I would expect (eg. 15 max_concurrent_tasks in queue settings should spawn 15 backend instances).
I somehow managed to overcome this issue by aggressively increasing max_concurrent_tasks in the queue configuration (200 max_concurrent_tasks will spawn 15 backend instances).
Unfortunately, as the number of tasks in queue decreases, the backend instances will start terminating.
Now, there is 8 tasks left in queue (out of several hundreds) and only 1 backend instance , which is running 1 task only. I cannot trigger starting additional instances even by clicking on the "RUN TASK" button in CloudTask web UI.
Has anyone of you came across similar issue?
Do you have any hint why this might be happening?
Why doesn't cloud task hit /_ah/start endpoint to spin up a new instance to run on?
I am not sure if basic scaling is good idea. According to the documentation:
If you use basic scaling, App Engine attempts to keep your cost low,
even though that may result in higher latency as the volume of
incoming requests increases
It seems that automatic scaling will be better idea. If you take a look at the same document you can find:
If you use automatic scaling, each instance in your app has its own
queue for incoming requests. Before the queues become long enough to
have a noticeable effect on your app's latency, App Engine
automatically creates one or more new instances to handle the
increasing load.
You can configure the settings for automatic scaling to achieve a
trade-off between the performance you want and the cost you can incur.
The documentation mentions 3 settings that can be used:
Target CPU Utilization
Target Throughput Utilization
Max Concurrent Requests
I think you should be able find the configuration that will serve you the best with those 3.

Why would GAE not distribute the load to the other available instances?

First thing first, here is my app.yaml:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: /.*
script: auto
automatic_scaling:
min_instances: 1
max_instances: 20
inbound_services:
- warmup
I'm using Apache Benchmark for this:
ab -c30 -n100000 "${URL}"
What I notice in the GAE console is that I have 8 instances available but only 3 take on 99% of the work. The rest is serving either no request or a very small portion.
Any idea what the problem could be here?
I would recommend to use the “max_concurrent_requests” element in your “app.yaml” file, as this element is the number of concurrent requests an automatic scaling instance can accept before scheduler spawns a new instance (Keep in mind that the maximum limit is 80).
Furthermore, you can also set “max_pending_latency” that specifies the maximum amount of time that App Engine should allow a request to wait in the pending queue before starting additional instances to handle requests, so that pending latency is reduced.
If you reach the limit, it will be a signal to scale up, so the number of instances will be increased.
The fact that the load is not evenly distributed across the running instances is normal and actually desired as long as the number of instances still processing requests is sufficient to handle the current load level with a satisfactory level of performance - this allows the other instances to be idle long enough to be automatically shutdown (due to inactivity).
This is part of the dynamic instance management logic used with automatic and basic scheduling.

What is the difference between min-instances and min-idle-instances in google app engine?

I want to understand the difference between min-instances & min-idle-instances?
I saw documentation on https://cloud.google.com/appengine/docs/standard/java/config/appref#scaling_elements but I am not able to differentiate between the two.
My use case:
I want at least 1 instance always up, as otherwise in most of the cases GAE would take time in creating instance causing my requests to time out (in case of basic scaling).
It should stay up, no matter if there is traffic or not, and if a request comes it should immediately serve it. If request volume grows then it should scale.
Which one I should use?
The min-idle-instances make reference to the instances that are ready to support your application in case you receive high traffic or CPU intensive tasks, unlike the min_instances which are the instances used to process the incoming request immediately. I suggest you to take a look on this link to have a deeper explanation of idle instances.
Based on this, since your use-case is focused on serve the incoming requests immediately, I think you should rather go with the min_instances functionality and use the min-idle-instances only in case you want to be ready for sudden load spikes.
The min-instances configuration applies to dynamic instances while min-idle-instances applies to idle/resident instances.
See also:
Introduction to instances for a description of the 2 instance types
Why do more requests go to new (dynamic) instances than to resident instance? for a bit more details
min_instances: the minimum number of instances running at any time, traffic or no traffic, rain or shine.
min_idle_instances: the minimum of idle (or "unused") instances running over the currently used instances. Example: you automatically scaled to 5 app engine instances that are receiving requests, by setting min_idle_instances to 2, you will be running 7 instances in total, the 2 "extra" instances are idle and waiting in case you receive more load. The goal is that when load raises, your users don't have to wait the load time it takes to start up an instance.
IMPORTANT: you need to configure warmup requests for that to work
IMPORTANT2: you'll be billed for any instance running, idle or not. App engine is not cheap so be careful.
min_instances applies to the number of instances that you want to have running, from 0 (useful if you want to scale down when you don't receive traffic) to 1000. You are charged for the number of instances you have running, so, this is important to save costs.
For your case set this value to 1, as it's the most straightforward option.

How to limit instance spawning (autoscaling) on GAE?

I have noticed a recent surge in instance spawning on GAE.
In my app.yaml I have clearly defined that max two instances should be created at a time.
application: xxx
version: 1-6-0
runtime: python27
api_version: 1
instance_class: F2
automatic_scaling:
max_idle_instances: 2
threadsafe: true
However the dashboard is showing 4 instances and the bills are going up. How can I stop this madness? :)
I did a lot of research into this.
F instances are automatically scaled. There is no way to limit that. Hence it makes sense to move the actual work away from frontend instances and put that into a backend instance (B1 or B2). The latter provides another 8 hours free quota.
The real challenge is to re-architect the app to use a default app.yaml for web statics, a mobile.yaml for mobile requests with shorter min_pending_latency and a backend.yaml (B2) instance for handling the tasks and calculations.
All this needs to be routed properly via a dispatch.yaml. In this file you can specify which url endpoint will be handled effectively by which module.
Best way to understand it, is to look at this excellent example from GAE:
It makes sense trying to make it work first on local environment, before trying anything on remote server.
dev_appserver.py dispatch.yaml app.yaml mobile.yaml backend.yaml
Also this official documentation explains some of the above in more detail.
Its pretty impressive, what can be achieved with GAE.
max_idle_instances
The maximum number of IDLE instances that App Engine should maintain
for this version.
It seems that it is not currently possible to set the max number of instances for automatic scaling module. As #DoIT suggests, you can set spending limit, however, keep in mind the below.
When an application exceeds its daily spending limit, any operation
whose free quota has been exhausted fails.
So if you need to control somehow the total number of instances and keep your service running, I see the following possibilities.
Change your scaling type to basic and set max_instances parameter as you like
Keep automatic scaling type and increase min_pending_latency and max_concurrent_requests parameters (multi-threading has to be enabled)
You can find more details here.
The '''max_idle_instances''' set the max number of idle instances, i.e. Instances that are waiting for traffic spike. From your screenshot, it looks like all instances are getting traffic, so it looks ok to me. You can set max daily budget, if you want to control your spend on GAE.
its possible that some of your requests are taking too long to complete, and thus cause new instances to be spawned. You could work around this (I'm told, but have yet to try it myself) by setting a high value for your min_pending_latency property. this could hurt your latency a little, but would also limit the rate of instance-spawning.

How to migrate gae app to new config?

I'm updating my webapp that previously has been using the default configuration. Now I'm starting to use modules trying this config
application: newkoolproject
# Other settings here...
version: newkool
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
min_idle_instances: 5
max_idle_instances: automatic # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
max_concurrent_requests: 50
I pay for instance hours, data reads and complex searches totalling a few dollar daily on my current budget (where my limit is 7 USD for a day to avoid sudden spikes due to DoSing or other technical issue).
Could it be feasible for my to try and squeeze my app into the freetier using memcache and other technologies to reduce costs? Or should I forget about reaching free-tier (< 28 instances hours etc) and instead make a configuration for a more optimal UX? How will the change change my costs?
Update 141010 18:59 CET
I could add appstats
You'll need to turn appestats on on your local app engine development server, go through your typical user flow. Here are the instructions on how to do this: https://cloud.google.com/appengine/docs/python/tools/appstats
Make sure you turn calculate RPC costs on:
appstats_CALC_RPC_COSTS = True
Once you go through a typical user flow, you would go to localhost:8080/_ah/stats and it will estimate how much specific calls and flows will cost when you get to production. Really great tool, as it even helps identify bottlenecks and slow running areas within your application.
Google's recommendation is to not only use memcache, but also to split work into smaller units (as much as possible) by leveraging task queues.
UPDATE: Simple memcache usage example
my_results = memcache.get("SOME-KEY-FOR-THIS-ITEM")
if not my_results:
#Do work here
memcache.set("SOME-KEY-FOR-THIS-ITEM", my_results)
return my_results

Resources