ERROR: The requested amount of instances has exceeded GCE's default quota - google-app-engine

I decided to use App Engine Flexible. But I am getting this error:
The requested amount of instances has exceeded GCE's default quota. Please see https://cloud.google.com/compute/quotas for more information on GCE resources
I have a billing account connected and have $ 300 in credit.
My app.yaml:
runtime: nodejs
service: server
env: flex
network:
session_affinity: true
handlers:
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
I've been trying to figure out how to fix this for a whole day now :(
Does anyone understand why this is so?

As suggested by #mahboob, the answer for this question is as mentioned in this question
As per the GCP doc the parameter 'max_num_instances', The maximum number of instances in your project should be 8 by default where I can see you are using 15. I would like to suggest you to increase the quota limit for your project will solve the issue.

If you just want to get through the deployment and don't care scaling, the following might help.
try to delete all versions and instances as many as possible, and remember to repeat deletion multiple times because deletion might fail.
in your app.yaml, limit the instance number to just 1 instance, and then try to deploy again.
manual_scaling:
instances: 1

Related

Why is my task failing in Google's App Engine?

About 3-4 times a week one of my two 12hr tasks that acts as an ETL from an API endpoint to a Snowflake DB fails and I can't figure out exactly why.
The Cron Task Mananger says it last ran at 6:29am this morning but in retrieving the logs there's only one line which says:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
I'm not sure if I need a warm-up, allocate specific workers, etc. because the log of the one-line error is so uninformative to me. I'm using a pretty sizable instance class I was hoping could handle most the workload.
Here is what the logs of a successful run look like:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-success2.csv
And the failure:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-20210104-074656.csv
App.yaml:
service: vetdata-loader
runtime: python38
instance_class: F4_1G
handlers:
- url: /task/loader
script: auto
Updated, here is my most recent app.yaml that's failing less now but still sometimes:
service: vetdata-loader
runtime: python38
instance_class: B4_1G
handlers:
- url: /task/loader
script: auto
basic_scaling:
max_instances: 11
idle_timeout: 30m
I think you don't use the correct instance class. If you have a look here about the timeouts and the task call you are limited to 10 minutes call for automatic scaling, and up to 24h with basic and manual scaling.
If I take your instance_class, the FXXX type is suitable for automatic scaling. Use a B4_1G instance class instead and check if you still have these issues. You should not.

Why are idle instances not being shut down when there is no traffic?

Some weeks ago my app on App Engine just started to increase the number of idle instances to an unreasonable high amount, even when there is close to zero traffic. This of course impacts my bill which is skyrocketing.
My app is simple Node.js application serving a GraphQL API that connects to my CloudSQL database.
Why are all these idle instances being started?
My app.yaml:
runtime: nodejs12
service: default
handlers:
- url: /.*
script: auto
secure: always
redirect_http_response_code: 301
automatic_scaling:
max_idle_instances: 1
Screenshot of monitoring:
This is very strange behavior, as per the documentation it should only temporarily exceed the max_idle_instances.
Note: When settling back to normal levels after a load spike, the
number of idle instances can temporarily exceed your specified
maximum. However, you will not be charged for more instances than the
maximum number you've specified.
Some possible solutions:
Confirm in the console that the actual app.yaml configuration is the same as in the app engine console.
Set min_idle_instances to 1 and max_idle_instances to 2 (temporarily) and redeploy the application. It could be that there is just something wrong on the scaling side, and redeploying the application could solve this.
Check your logging (filter app engine) if there is any problem in shutting down the idle instances.
Finally, you could tweak settings like max_pending_latency. I have seen people build applications that take 2-3 seconds to start up, while the default is 30ms before another instance is being spun up.
This post suggests setting the following, which you could try:
instance_class: F1
automatic_scaling:
max_idle_instances: 1 # default value
min_pending_latency: automatic # default value
max_pending_latency: 30ms
Switch to basic_scaling, let Google determine the best scaling algorithm (last resort option). This would look something like this:
basic_scaling:
max_instances: 5
idle_timeout: 15m
The solution could of course also be a combination of 2 and 4.
Update after 24 hours:
I followed #Nebulastic suggestions, number 2 and 4, but it did not make any difference. So in frustration I disabled the entire Google App Engine (App Engine > Settings > Disable application) and left it off for 10 minutes and confirmed in the monitoring dashboard that everything was dead (sorry, users!).
After 10 minutes I enabled App Engine again and it booted only 1 instance. I've been monitoring it closely since and it seems (finally) to be good now. And now after the restart it also adheres to the "min" and "max" idle instances configuration - the suggestion from #Nebulastic. Thanks!
Screenshots:
Have you checked to make sure you dont have a bunch of old versions still running? https://console.cloud.google.com/appengine/versions
check for each service in the services dropdown

App running in Google App Engine fails, tries ah_start for minutes, then restarts

I have a message processor task that runs in the app engine. There are many times that it appears to die, then go into a long (several minutes) log trying to do ah_start, then finally restarts.
This task responds to messages from the message queue, then writes data from these messages to a mySql database.
Looking at the log histogram, it appears that this task is in a 15 minute cycle, where it works for a bit, then does this ah_start loop for a bit, then goes back to working.
When I start sending a heavy load of messages to process, it looses messages which is not an optimal situation for a production environment.
I really don't know even where to check to find out what is going on.
I am sorry but search as I can I really can not find good information on how to use the _ah/start process. A good link to to an explanation and example would to worth a lot.
My process is very simple,
start up
wait for message
store data in data base
ack message
go back to wait for next message
Here is a copy of my app.yaml file:
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
service: message-processor
runtime: nodejs10
env_variables:
BUCKET_NAME: "stans_temp"
handlers:
- url: /stylesheets
static_dir: stylesheets
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
Thanks for any help.
I would start with correcting syntax errors in app.yaml.
As I can see: runtime: nodejs10 and there is no env: flex settings this seems to be App Engine Standard environment. (app.yaml for standard reference)
However I can see that you have resources setting with is only for App Engine Flexible. (app.yaml for flexible reference)
App Engine Flex and App Engine Standard are practically two different products, so you need to decide which one you want to use. The article about it you may find here. This might be reason, I am even surprised that this was deployed successfully.

How solve High latency in app engine caused by "This request caused a new process to be started for your application..."?

App working with standard environment app engine, python 3.7 and cloud sql (Mysql)
Checking the logs there are some with very high latencies (more than 4 seconds), when the expected are 800ms. All these logs are accompanied by this message:
"This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application."
I understand that when it refers to a new process it refers to the deployment of a new instance (since I use automatic scaling) however the strange thing is that when comparing these logs with the deployment of instances in some cases it matches but in others it does not.
My question is, how can these latencies be reduced?
The app engine config is:
runtime: python37
env: standard
instance_class: F1
handlers:
- url: /static/(.*)
static_files: static/\1
require_matching_file: false
upload: static/.*
- url: /.*
script: auto
secure: always
- url: .*
script: auto
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
network: {}
As you note, these slower requests happen whenever app engine needs to start a new instance for your application, as the initial load is slow (these are called "loading requests").
However, App Engine does provide a way to use "warmup" requests -- basically, dummy requests to your application to start instances in advance of when they are actually needed. This can reduce, but not eliminate the user-affecting loading requests.
This can slightly increase your costs, but it should reduce the loading request latency as these dummy requests will be the ones that eat the cost of starting a new instance.
In the python 3.7 runtime, you can add a "warmup" element to the inbound_services directive in app.yaml:
inbound_services:
- warmup
This will send a request to /_ah/warmup where, if you want, you can do any other initialization the instance needs (e.g. starting a DB connection pool).
There are more strategies that may help you decrease your latencies in your application.
You can modify your automatic_scaling options in order to use something that may suit better for your app.
You can manage better your bandwidth by setting the appropriate Cache-Control header on your responses and set reasonable expiration times for static files.
Using public Cache-Control headers in this way will allow proxy servers and your clients' browser to cache responses for the designated period of time.
You can use bigger instance class like F2 in order to avoid horizontal scaling happening so often. As I understood from this issue, your latencies increase mostly while new instances are deployed.
You can, also enable concurrent requests and write your code as asynchronously as you can.

AppEngine NodeJS flexible spawns 2 instances after deployment

I have a pretty basic app.yaml file with the following:
runtime: nodejs
env: flex
service: front
And everytime I deploy the application, the deployment take a very long time in the step:
Updating service [front] (this may take several minutes)...
When I check in the console, I can see that it goes up from 1 instances to 2 even if I didn't specify anything about the number of instances. Why is Google doing this ? and how can we set the starting number of instances without disabling the autoscaling feature ? Thanks in advance !
On App Engine Flexible applications, the minimum number of instances given to your service defaults to 2 to reduce latency. This is documented here.
You can configure these settings differently by adding them in your app.yaml file like this:
runtime: nodejs
env: flex
service: front
automatic_scaling:
min_num_instances: 1 // Default is 2. Must be 1 or greater
max_num_instances: 10 // Default is 20.

Resources