Why would GAE not distribute the load to the other available instances?

Why would GAE not distribute the load to the other available instances? - google-app-engine

First thing first, here is my app.yaml:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: /.*
script: auto
automatic_scaling:
min_instances: 1
max_instances: 20
inbound_services:
- warmup
I'm using Apache Benchmark for this:
ab -c30 -n100000 "${URL}"
What I notice in the GAE console is that I have 8 instances available but only 3 take on 99% of the work. The rest is serving either no request or a very small portion.
Any idea what the problem could be here?

I would recommend to use the “max_concurrent_requests” element in your “app.yaml” file, as this element is the number of concurrent requests an automatic scaling instance can accept before scheduler spawns a new instance (Keep in mind that the maximum limit is 80).
Furthermore, you can also set “max_pending_latency” that specifies the maximum amount of time that App Engine should allow a request to wait in the pending queue before starting additional instances to handle requests, so that pending latency is reduced.
If you reach the limit, it will be a signal to scale up, so the number of instances will be increased.

The fact that the load is not evenly distributed across the running instances is normal and actually desired as long as the number of instances still processing requests is sufficient to handle the current load level with a satisfactory level of performance - this allows the other instances to be idle long enough to be automatically shutdown (due to inactivity).
This is part of the dynamic instance management logic used with automatic and basic scheduling.

Related

CloudTasks not scaling GAE Backend instances

I was searching for a solution which would allow me to run tasks up to 24h long. The combination of Cloud Tasks and multiple AppEngine Backend instances seemed like a perfect way to go.
As the tasks are long running, I would like to scale to max_instances as fast as possible. But I am having trouble to do so.
Here is my app.yaml
service: slow
runtime: python37
# --timeout=90000 (25h) -> AppEngine Backend Instance should raise TimeoutExceededError after 24h
entrypoint: gunicorn main:app --workers 1 --timeout=90000
instance_class: B2
basic_scaling:
max_instances: 15
Here is a printscreen of my Cloud Task Queue configuration.
My issue is that the tasks from Cloud Task Queue are not spawning new instances as I would expect (eg. 15 max_concurrent_tasks in queue settings should spawn 15 backend instances).
I somehow managed to overcome this issue by aggressively increasing max_concurrent_tasks in the queue configuration (200 max_concurrent_tasks will spawn 15 backend instances).
Unfortunately, as the number of tasks in queue decreases, the backend instances will start terminating.
Now, there is 8 tasks left in queue (out of several hundreds) and only 1 backend instance , which is running 1 task only. I cannot trigger starting additional instances even by clicking on the "RUN TASK" button in CloudTask web UI.
Has anyone of you came across similar issue?
Do you have any hint why this might be happening?
Why doesn't cloud task hit /_ah/start endpoint to spin up a new instance to run on?

I am not sure if basic scaling is good idea. According to the documentation:
If you use basic scaling, App Engine attempts to keep your cost low,
even though that may result in higher latency as the volume of
incoming requests increases
It seems that automatic scaling will be better idea. If you take a look at the same document you can find:
If you use automatic scaling, each instance in your app has its own
queue for incoming requests. Before the queues become long enough to
have a noticeable effect on your app's latency, App Engine
automatically creates one or more new instances to handle the
increasing load.
You can configure the settings for automatic scaling to achieve a
trade-off between the performance you want and the cost you can incur.
The documentation mentions 3 settings that can be used:
Target CPU Utilization
Target Throughput Utilization
Max Concurrent Requests
I think you should be able find the configuration that will serve you the best with those 3.

Initial requests to datastore and cloud tasks have higher latency, is that normal?

My app engine service is written in Go. I have code that connects to Cloud Datastore before even the server listens on the port. There is a single Projection query that takes about 500ms reading just 4 entities. Does the first interaction with datastore have higher latency potentially as a connection needs to be established? Any way this datastore connection latency be reduced? Also, is there any difference in doing this db call before listening to the port vs doing it within the warmup request (this is an autoscaled instance).
Similar to high initial latency for Cloud Datastore, I see a similar pattern for Cloud Tasks. Initial task creation could be as high as 500ms but even subsequent ones are any where from 200 to 400ms. This is in us-central. I was actually considering moving a db update to a background task but in general I am seeing the latency of task creation to be more or less same as doing a transaction to read and update the data giving no net benefit.
Finally, instance startup time is typically 2.5 to 3 seconds with the main getting called after about 2 seconds. My app startup time is the above mentioned project query cost of 500ms and nothing else. So, no matter how much I optimize my app startup, should I assume an additional latency of about 2 seconds?
Note that the load on the system is very light so these issues can't be because of high volume.
Update: deployment files as requested by Miguel (this is for a test environment investigating performance characteristics. Prod deployment will be more generous for instances)
default app:
service: default
runtime: go112
instance_class: F1
automatic_scaling:
min_instances: 0
max_instances: 1
min_idle_instances: 1
max_idle_instances: 1
min_pending_latency: 200ms
max_pending_latency: 500ms
max_concurrent_requests: 10
target_cpu_utilization: 0.9
target_throughput_utilization: 0.9
inbound_services:
- warmup
backend app:
service: backend-services
runtime: go112
instance_class: B1
basic_scaling:
idle_timeout: 1m
max_instances: 1

200-500ms to initialize a client seems reasonable because there is a remote connection being established. Also, a 1-2 seconds cold start for App Engine also seems normal.
As you mentioned, you can experiment with a warmup request to reduce cold starts and initialize clients.
I would also recommend looking into the mode you are running your Datastore in (native vs datastore). There is increase latency when using datastore mode, for more info see Cloud Datastore Best Practices.

Scaling out GAE when memory utilization exceeds limit

I am using GAE for heavy tasks with high memory needs. And I got the following error:
Exceeded soft memory limit of 512 MB with 561 MB
after servicing 3 requests total.
Consider setting a larger instance class in app.yaml.
As the tasks are expensive, I assume that two applications can work within one instance. But it does not work for three applications:
While handling this request,
the process that handled this request was found to be using
too much memory and was terminated.
This is likely to cause a new process to be used
for the next request to your application.
If you see this message frequently,
you may have a memory leak in your application or may be
using an instance with insufficient memory.
Consider setting a larger instance class in app.yaml.
My current settings:
runtime: nodejs8
instance_class: B4
basic_scaling:
max_instances: 10
idle_timeout: 1m
I also tried with these settings:
runtime: nodejs8
instance_class: F4
automatic_scaling:
target_cpu_utilization: 0.5
max_instances: 10
The failure of executing the task is "Exceeded soft memory limit".
So, to solve this error, I think the scale out should be done based on the "memory utilization" rather than "cpu utilization".
How can scaling out be done when the memory utilization exceeds the limit?

The decisions of the dynamic instance scheduler are not based on instance memory usage. From Scaling dynamic instances:
The App Engine scheduler decides whether to serve each new request
with an existing instance (either one that is idle or accepts
concurrent requests), put the request in a pending request queue, or
start a new instance for that request. The decision takes into account
the number of available instances, how quickly your application has
been serving requests (its latency), and how long it takes to start a
new instance.
The error messages you received all show a very low number of requests being handled before the error happens, which indicates that the memory starvation is very strong.
What isn't clear is if the high memory usage comes from a single request or if it is related to multiple concurrent requests - i.e. requests hitting an instance at the same time (or too close to each-other for the memory freeing mechanism, if any, to keep up). But that can be experimentally determined.
If multiple concurrent incoming requests is what drives the instance memory usage over the threshold you can try to deal with it by controlling the target_throughput_utilization and/or max_concurrent_requests knobs:
Target Throughput Utilization
Sets the throughput threshold for the number of concurrent requests
after which more instances will be started to handle traffic.
Max Concurrent Requests
Sets the max concurrent requests an instance can accept before the
scheduler spawns a new instance.
If the memory limit is being hit even without the instance handling multiple requests simultaneously (or if you would like to be able to serve more instances concurrently - typically to reduce costs) the only way you can address it is by using instance classes with more memory. The F4 and B4 classes you tried both have 512M, try F4_1G/B4_1G, both of which have 1G.

How to limit instance spawning (autoscaling) on GAE?

I have noticed a recent surge in instance spawning on GAE.
In my app.yaml I have clearly defined that max two instances should be created at a time.
application: xxx
version: 1-6-0
runtime: python27
api_version: 1
instance_class: F2
automatic_scaling:
max_idle_instances: 2
threadsafe: true
However the dashboard is showing 4 instances and the bills are going up. How can I stop this madness? :)

I did a lot of research into this.
F instances are automatically scaled. There is no way to limit that. Hence it makes sense to move the actual work away from frontend instances and put that into a backend instance (B1 or B2). The latter provides another 8 hours free quota.
The real challenge is to re-architect the app to use a default app.yaml for web statics, a mobile.yaml for mobile requests with shorter min_pending_latency and a backend.yaml (B2) instance for handling the tasks and calculations.
All this needs to be routed properly via a dispatch.yaml. In this file you can specify which url endpoint will be handled effectively by which module.
Best way to understand it, is to look at this excellent example from GAE:
It makes sense trying to make it work first on local environment, before trying anything on remote server.
dev_appserver.py dispatch.yaml app.yaml mobile.yaml backend.yaml
Also this official documentation explains some of the above in more detail.
Its pretty impressive, what can be achieved with GAE.

max_idle_instances
The maximum number of IDLE instances that App Engine should maintain
for this version.
It seems that it is not currently possible to set the max number of instances for automatic scaling module. As #DoIT suggests, you can set spending limit, however, keep in mind the below.
When an application exceeds its daily spending limit, any operation
whose free quota has been exhausted fails.
So if you need to control somehow the total number of instances and keep your service running, I see the following possibilities.
Change your scaling type to basic and set max_instances parameter as you like
Keep automatic scaling type and increase min_pending_latency and max_concurrent_requests parameters (multi-threading has to be enabled)
You can find more details here.

The '''max_idle_instances''' set the max number of idle instances, i.e. Instances that are waiting for traffic spike. From your screenshot, it looks like all instances are getting traffic, so it looks ok to me. You can set max daily budget, if you want to control your spend on GAE.

its possible that some of your requests are taking too long to complete, and thus cause new instances to be spawned. You could work around this (I'm told, but have yet to try it myself) by setting a high value for your min_pending_latency property. this could hurt your latency a little, but would also limit the rate of instance-spawning.

Set the minimum number of instances

I understand that App Engine handles scaling automatically. However, in order to test drive some multi-instance / consolidated state scenarios, I'd like to instruct App Engine to fire up a minimum of 5 instances, even if load does not justify this.
Is there a way of doing this via app.yaml or Dashboard?

Try setting the Min Pending Latency to a very low number (e.g. 100ms), and then send a burst of requests to your app. Then the scheduler will start spinning up multiple instances to handle these requests.
You may need to use a tool for automated load testing - it will be difficult to achieve this manually.

This is what the min_idle_instances value controls.
In app.yaml:
automatic_scaling:
min_idle_instances: 5

It is possible to use either automatic_scaling or manual_scaling. With automatic scaling, you can only influence how many instances are started, with manual scaling, you can decide how many instances you wish to use. Place the following in your app.yaml, or module yaml file:
manual_scaling:
instances: 5

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight