AppEngine NodeJS flexible spawns 2 instances after deployment - google-app-engine

I have a pretty basic app.yaml file with the following:
runtime: nodejs
env: flex
service: front
And everytime I deploy the application, the deployment take a very long time in the step:
Updating service [front] (this may take several minutes)...
When I check in the console, I can see that it goes up from 1 instances to 2 even if I didn't specify anything about the number of instances. Why is Google doing this ? and how can we set the starting number of instances without disabling the autoscaling feature ? Thanks in advance !

On App Engine Flexible applications, the minimum number of instances given to your service defaults to 2 to reduce latency. This is documented here.
You can configure these settings differently by adding them in your app.yaml file like this:
runtime: nodejs
env: flex
service: front
min_num_instances: 1 // Default is 2. Must be 1 or greater
max_num_instances: 10 // Default is 20.


How can I reduce App Engine billing cost?

My app engine yaml file is somewhat like below
service: servicename
runtime: php74
min_idle_instances: 2
max_pending_latency: 1s
DB_USER: my-db-user
DB_PASS: my-db-pass
DB_NAME: my-db
automatic scaling cause higher cost? what is the cheapest configuration I can set. it's not mandatory to have auto scaling at current stage of my application.
I think your cheapest configuration is just setting max_instances: 1 and commenting out the other options.
When you have traffic, the maximum number of instances that you will have will be 1. When there's no traffic, your instance goes down (effectively 0).
The downside with this approach (not having min_idle_instance as you currently do) is that brand new traffic to your site will take some time because of the time for your instance to be started.

ERROR: The requested amount of instances has exceeded GCE's default quota

I decided to use App Engine Flexible. But I am getting this error:
The requested amount of instances has exceeded GCE's default quota. Please see for more information on GCE resources
I have a billing account connected and have $ 300 in credit.
My app.yaml:
runtime: nodejs
service: server
env: flex
session_affinity: true
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
I've been trying to figure out how to fix this for a whole day now :(
Does anyone understand why this is so?
As suggested by #mahboob, the answer for this question is as mentioned in this question
As per the GCP doc the parameter 'max_num_instances', The maximum number of instances in your project should be 8 by default where I can see you are using 15. I would like to suggest you to increase the quota limit for your project will solve the issue.
If you just want to get through the deployment and don't care scaling, the following might help.
try to delete all versions and instances as many as possible, and remember to repeat deletion multiple times because deletion might fail.
in your app.yaml, limit the instance number to just 1 instance, and then try to deploy again.
instances: 1

Why are idle instances not being shut down when there is no traffic?

Some weeks ago my app on App Engine just started to increase the number of idle instances to an unreasonable high amount, even when there is close to zero traffic. This of course impacts my bill which is skyrocketing.
My app is simple Node.js application serving a GraphQL API that connects to my CloudSQL database.
Why are all these idle instances being started?
My app.yaml:
runtime: nodejs12
service: default
- url: /.*
script: auto
secure: always
redirect_http_response_code: 301
max_idle_instances: 1
Screenshot of monitoring:
This is very strange behavior, as per the documentation it should only temporarily exceed the max_idle_instances.
Note: When settling back to normal levels after a load spike, the
number of idle instances can temporarily exceed your specified
maximum. However, you will not be charged for more instances than the
maximum number you've specified.
Some possible solutions:
Confirm in the console that the actual app.yaml configuration is the same as in the app engine console.
Set min_idle_instances to 1 and max_idle_instances to 2 (temporarily) and redeploy the application. It could be that there is just something wrong on the scaling side, and redeploying the application could solve this.
Check your logging (filter app engine) if there is any problem in shutting down the idle instances.
Finally, you could tweak settings like max_pending_latency. I have seen people build applications that take 2-3 seconds to start up, while the default is 30ms before another instance is being spun up.
This post suggests setting the following, which you could try:
instance_class: F1
max_idle_instances: 1 # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
Switch to basic_scaling, let Google determine the best scaling algorithm (last resort option). This would look something like this:
max_instances: 5
idle_timeout: 15m
The solution could of course also be a combination of 2 and 4.
Update after 24 hours:
I followed #Nebulastic suggestions, number 2 and 4, but it did not make any difference. So in frustration I disabled the entire Google App Engine (App Engine > Settings > Disable application) and left it off for 10 minutes and confirmed in the monitoring dashboard that everything was dead (sorry, users!).
After 10 minutes I enabled App Engine again and it booted only 1 instance. I've been monitoring it closely since and it seems (finally) to be good now. And now after the restart it also adheres to the "min" and "max" idle instances configuration - the suggestion from #Nebulastic. Thanks!
Have you checked to make sure you dont have a bunch of old versions still running?
check for each service in the services dropdown

Google APP Engine - spawns new instance for every connection or has zero instances

I am noticing something a little odd with Google App Engine. If my app has not been used and I go open it I notice that it takes some time to load, I also see in the GAE logs console that it is starting up a server during this time so that accounts for the wait (why not always have an instance running?)
After I open and close the app a couple of times I then notice in the versions tab of GAE that I have 7 running instances (all in the same version).
Im a little confused how GAE works, does it roll down your instances to 0 when there is no requests for a while and then on the flip side, does it spin up a new instance for every new client connecting ?
my app.yaml is looking like this:
runtime: nodejs10
env: standard
instance_class: F2
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
You need to fine tune your App Engine scaling strategy, for example please check this app.yaml file
runtime: nodejs10
env: standard
instance_class: F2
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
min_instances: 1
max_instances: 4
min_idle_instances: 1
max_concurrent_requests: 25
target_throughput_utilization: 0.8
- warmup
min_instances & min_idle_instances are set to 1 in order to have almost 1 instance ready for incoming requests and avoid cold start.
To avoid spin up new instances too fast, you can set max_concurrent_requests & target_throughput_utilization, in this example a new instance will be spin up until an instance reaches 20 concurrent requests (25 X 0.8)
As is mentioned in this document, it is necessary create a warmup endpoint in your application and add inbound_services in your app.yaml file, for example:
app.get('/_ah/warmup', (req, res) => {
// Handle your warmup logic. Initiate db connection, etc.
warmup calls carry the benefit of prepare your instances before an incoming request and reduce the latency of first request.
As you did not specify any scaling setting in your app.yaml, App Engine is using automatic scaling.
That means that the application has 0 minimum instances so when your app is not receiving any request at all it will scale down to 0. With that option you will sabve the costs that imply having an instance running all the time, but also cold starts will happen. A cold start happens each time a request reaches your application but there are no instances ready to serve it and a new one has to be created.
Regarding your application scaling up to 7 instances when the traffic load increases, it depends again on the workload that is receiving. You can control this behaviour as well by using the max_instances setting, although using a low value could affect your application's performance if more instances are needed.
App Engine will be spinning up new instances if the threshold value on target_cpu_utilization, target_throughput_utilization , max_concurrent_requests, max_pending_latency or min_pending_latency is reached. You can read about all of them here.

Running min 1 instance of Google-App-Engine in standard environment

Looking at the Google-App-Engine's two environments, standard and flex, most of the features offered by standard seem more appropriate for my use case.
According to, both standard and flex environment support automatic scaling while standard can scale to 0 instances and flex can scale to 1 instance.
According to, an option for automatic scaling is specifying the min/max number of instances running at any given moment. I would have thought that this would 'override' standard environment's ability to scale to zero, but after my service had seen no traffic in 15 hours, it still closed the last remaining instance.
I have the following config-settings in my app.yaml file.
runtime: nodejs10
min_instances: 1
max_instances: 1 # Increase in production
target_cpu_utilization: 0.95
I was trying to force GAE to have 1 running instance at any time while in testing. I realize that having a static number of instances running is not the point of automatic scaling, but I plan to increase the maximum number of instances when moving to production. I have also tried adding min_idle_instances: 1 to the settings without any difference.
Can standard environment be forced to have a minimum of 1 running instance at any time?
A way to ensure that your instance is ready to serve is to configure warm up request.
Bear in mind that even with Warm up request, you might encounter loading request. If your app has no traffic, the first request will always be a loading request and not a warm up. Thus, in my opinion the best way to approach a situation like this is to set 2 min_instances.
Example of an express.js handler:
const express = require('express');
const app = express();
app.get('/_ah/warmup', (req, res) => {
// Handle your warmup logic. Initiate db connection, etc.
// Rest of your application handlers.
app.get('/', handler);
Example of app.yaml addition:
- warmup
A workaround it could be to use cron job that triggers every minute, so your instance it will be available to serve you. However, even with this approach 2 min_instance is a better solution.
