Why are idle instances not being shut down when there is no traffic?

Why are idle instances not being shut down when there is no traffic? - google-app-engine

Some weeks ago my app on App Engine just started to increase the number of idle instances to an unreasonable high amount, even when there is close to zero traffic. This of course impacts my bill which is skyrocketing.
My app is simple Node.js application serving a GraphQL API that connects to my CloudSQL database.
Why are all these idle instances being started?
My app.yaml:
runtime: nodejs12
service: default
handlers:
- url: /.*
script: auto
secure: always
redirect_http_response_code: 301
automatic_scaling:
max_idle_instances: 1
Screenshot of monitoring:

This is very strange behavior, as per the documentation it should only temporarily exceed the max_idle_instances.
Note: When settling back to normal levels after a load spike, the
number of idle instances can temporarily exceed your specified
maximum. However, you will not be charged for more instances than the
maximum number you've specified.
Some possible solutions:
Confirm in the console that the actual app.yaml configuration is the same as in the app engine console.
Set min_idle_instances to 1 and max_idle_instances to 2 (temporarily) and redeploy the application. It could be that there is just something wrong on the scaling side, and redeploying the application could solve this.
Check your logging (filter app engine) if there is any problem in shutting down the idle instances.
Finally, you could tweak settings like max_pending_latency. I have seen people build applications that take 2-3 seconds to start up, while the default is 30ms before another instance is being spun up.
This post suggests setting the following, which you could try:
instance_class: F1
automatic_scaling:
max_idle_instances: 1 # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
Switch to basic_scaling, let Google determine the best scaling algorithm (last resort option). This would look something like this:
basic_scaling:
max_instances: 5
idle_timeout: 15m
The solution could of course also be a combination of 2 and 4.

Update after 24 hours:
I followed #Nebulastic suggestions, number 2 and 4, but it did not make any difference. So in frustration I disabled the entire Google App Engine (App Engine > Settings > Disable application) and left it off for 10 minutes and confirmed in the monitoring dashboard that everything was dead (sorry, users!).
After 10 minutes I enabled App Engine again and it booted only 1 instance. I've been monitoring it closely since and it seems (finally) to be good now. And now after the restart it also adheres to the "min" and "max" idle instances configuration - the suggestion from #Nebulastic. Thanks!
Screenshots:

Have you checked to make sure you dont have a bunch of old versions still running? https://console.cloud.google.com/appengine/versions
check for each service in the services dropdown

Related

How can I reduce App Engine billing cost?

My app engine yaml file is somewhat like below
service: servicename
runtime: php74
automatic_scaling:
min_idle_instances: 2
max_pending_latency: 1s
env_variables:
CLOUD_SQL_CONNECTION_NAME: <MY-PROJECT>:<INSTANCE-REGION>:<MY-DATABASE>
DB_USER: my-db-user
DB_PASS: my-db-pass
DB_NAME: my-db
---
automatic scaling cause higher cost? what is the cheapest configuration I can set. it's not mandatory to have auto scaling at current stage of my application.

I think your cheapest configuration is just setting max_instances: 1 and commenting out the other options.
When you have traffic, the maximum number of instances that you will have will be 1. When there's no traffic, your instance goes down (effectively 0).
The downside with this approach (not having min_idle_instance as you currently do) is that brand new traffic to your site will take some time because of the time for your instance to be started.

App running in Google App Engine fails, tries ah_start for minutes, then restarts

I have a message processor task that runs in the app engine. There are many times that it appears to die, then go into a long (several minutes) log trying to do ah_start, then finally restarts.
This task responds to messages from the message queue, then writes data from these messages to a mySql database.
Looking at the log histogram, it appears that this task is in a 15 minute cycle, where it works for a bit, then does this ah_start loop for a bit, then goes back to working.
When I start sending a heavy load of messages to process, it looses messages which is not an optimal situation for a production environment.
I really don't know even where to check to find out what is going on.
I am sorry but search as I can I really can not find good information on how to use the _ah/start process. A good link to to an explanation and example would to worth a lot.
My process is very simple,
start up
wait for message
store data in data base
ack message
go back to wait for next message
Here is a copy of my app.yaml file:
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
service: message-processor
runtime: nodejs10
env_variables:
BUCKET_NAME: "stans_temp"
handlers:
- url: /stylesheets
static_dir: stylesheets
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
Thanks for any help.

I would start with correcting syntax errors in app.yaml.
As I can see: runtime: nodejs10 and there is no env: flex settings this seems to be App Engine Standard environment. (app.yaml for standard reference)
However I can see that you have resources setting with is only for App Engine Flexible. (app.yaml for flexible reference)
App Engine Flex and App Engine Standard are practically two different products, so you need to decide which one you want to use. The article about it you may find here. This might be reason, I am even surprised that this was deployed successfully.

Google APP Engine - spawns new instance for every connection or has zero instances

I am noticing something a little odd with Google App Engine. If my app has not been used and I go open it I notice that it takes some time to load, I also see in the GAE logs console that it is starting up a server during this time so that accounts for the wait (why not always have an instance running?)
After I open and close the app a couple of times I then notice in the versions tab of GAE that I have 7 running instances (all in the same version).
Im a little confused how GAE works, does it roll down your instances to 0 when there is no requests for a while and then on the flip side, does it spin up a new instance for every new client connecting ?
my app.yaml is looking like this:
runtime: nodejs10
env: standard
instance_class: F2
handlers:
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto

You need to fine tune your App Engine scaling strategy, for example please check this app.yaml file
runtime: nodejs10
env: standard
instance_class: F2
handlers:
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
automatic_scaling:
min_instances: 1
max_instances: 4
min_idle_instances: 1
max_concurrent_requests: 25
target_throughput_utilization: 0.8
inbound_services:
- warmup
min_instances & min_idle_instances are set to 1 in order to have almost 1 instance ready for incoming requests and avoid cold start.
To avoid spin up new instances too fast, you can set max_concurrent_requests & target_throughput_utilization, in this example a new instance will be spin up until an instance reaches 20 concurrent requests (25 X 0.8)
As is mentioned in this document, it is necessary create a warmup endpoint in your application and add inbound_services in your app.yaml file, for example:
app.get('/_ah/warmup', (req, res) => {
// Handle your warmup logic. Initiate db connection, etc.
});
warmup calls carry the benefit of prepare your instances before an incoming request and reduce the latency of first request.

As you did not specify any scaling setting in your app.yaml, App Engine is using automatic scaling.
That means that the application has 0 minimum instances so when your app is not receiving any request at all it will scale down to 0. With that option you will sabve the costs that imply having an instance running all the time, but also cold starts will happen. A cold start happens each time a request reaches your application but there are no instances ready to serve it and a new one has to be created.
Regarding your application scaling up to 7 instances when the traffic load increases, it depends again on the workload that is receiving. You can control this behaviour as well by using the max_instances setting, although using a low value could affect your application's performance if more instances are needed.
App Engine will be spinning up new instances if the threshold value on target_cpu_utilization, target_throughput_utilization , max_concurrent_requests, max_pending_latency or min_pending_latency is reached. You can read about all of them here.

Frequent restarts on Google App Engine Standard second generation

We have a problem with frequent restarts of App Engine instances which last for 15-30 minutes, sometimes maybe 1 hour.
Last 24 hours, we had 72 instance restarts. We have looked into the logs, but can't find any error messages explaining this.
Min_instances is set to 1.
The app is a PHP Codeigniter app running on the php73 runtime.
Maybe this is relevant as it shows up regularly in the log, not at the same time as web requests :
A 2020-05-01T17:46:46.675532Z [start] 2020/05/01 17:46:46.674713 Quitting on terminated signal
A 2020-05-01T17:46:46.900441Z [start] 2020/05/01 17:46:46.899377 Start program failed: termination triggered by nginx exit
Looking at the request log it looks like there is no pattern in page requests that could lead to instances crashing.
All page-requests load in typically 1-80 ms, there are no heavy scripts. It looks like the instances crash while idle.
We have also tried to increase the instance type to F4 with the same results.
The graphs for CPU usage and memory usage don't give us any clue.
The problem with this is loading requests for site visitors. Most of the time the site is fast and responsive, but it is possible with accidental loading times of 1s+ when new instances start. We have set up warm up requests, but that does not cover all instance starts.
Is this a normal behavior ? How can we debug further ? Any clue what can be wrong ?
Thanks for any help.
EDIT: Here is our app.yaml:
runtime: php73
entrypoint: serve public_html/index.php
instance_class: F2
automatic_scaling:
min_instances: 1
inbound_services:
- warmup
vpc_access_connector:
name: "xx"
handlers
- url: /
script: auto
secure: always
- url: /(.+)
script: auto
secure: always
env_variables:
CLOUD_SQL_CONNECTION_NAME: xx
REDIS_HOST: xx
REDIS_PORT: xx

Rolling restarts are causing are app engine app to go offline. Is there a way to change the config to prevent that from happening?

About once a week our flexible app engine node app goes offline and the following line appears in the logs: Restarting batch of VMs for version 20181008t134234 as part of rolling restart. We have our app set to automatic scaling with the following settings:
runtime: nodejs
env: flex
beta_settings:
cloud_sql_instances: tuzag-v2:us-east4:tuzag-db
automatic_scaling:
min_num_instances: 1
max_num_instances: 3
liveness_check:
path: "/"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
readiness_check:
path: "/"
check_interval_sec: 15
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
resources:
cpu: 1
memory_gb: 1
disk_size_gb: 10
I understand the rolling restarts of GCP/GAE, but am confused as to why Google isn't spinning up another VM before taking our primary one offline. Do we have to run with a min num of 2 instances to prevent this from happening? Is there a way I get configure my app.yaml to make sure another instance is spun up before it reboots the only running instance? After the reboot finishes, everything comes back online fine, but there's still 10 minutes of downtime, which isn't acceptable, especially considering we can't control when it reboots.

We know that it is expected behaviour that Flexible instances are restarted on a weekly basis. Provided that health checks are properly configured and are not the issue, the recommendation is, indeed, to set up a minimum of two instances.
There is no alternative functionality in App Engine Flex, of which I am aware of, that raises a new instance to avoid downtime as a result of a weekly restart. You could try to run directly on Google Compute Engine instead of App Engine and manage updates and maintenance by yourself, perhaps that would suit your purpose better.

Are you just guessing this based on that num instances graph in the app engine dashboard? Or is your app engine project actually unresponsive during that time?
You could use cron to hit it every 5 minutes to see if it's responsive.
Does this issue persist if you change cool_down_period_sec & target_utilization back to their defaults?
If your service is truly down during that time, maybe you should implement a request handler for liveliness checks:
https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#updated_health_checks
Their default polling config would tell GAE to launch within a couple minutes
Another thing worth double checking is how long it takes your instance to start up.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight