I'm using Google App Engine flexible environment on PHP 7.2 application and we are using auto_scaling.
We occasionally get traffic spikes, but App engine will always keep 4 instances running.
What settings to add to app.yaml to force App Engine to always keep only one idle instance.
Thank you.
Since you are using App Engine Flex with auto scaling, you can't use the flag "max_idle_instances" in the app.yaml since this would give you an error; however, I would advice you to set a "max_num_instances" in order to prevent you from starting too many instances during one of your spikes.
I would also advice you to check out this video and this other one, as they explain very well how to set up your app.yaml to optimize your scaling.
Hope you find this helpful!
Related
It appears that AppEngine standard has a warmup feature to warm up an app after a deployment but I don't see the same feature available for Flex. The readiness & liveness probes also don't work for this since setting the path setting to a custom path inside the application doesn't seem to make the probes actually hit the internal endpoint.
Is there some solution I'm missing other than doing things like manually hitting the endpoints myself after the deployment which won't be very reliable since the calls don't necessarily always round robin to each instance?
In App Engine Standard, warmup requests essentially load your app's code into a new instance before any live requests reach that instance. This can happen in the following situations:
When you redeploy a version of your app.
When new instances are created due to the load from requests
exceeding the capacity of the current set of running instances.
When maintenance and repairs of the underlying infrastructure or
physical hardware occur
In App Engine Flexible, you can achieve the same result by using the initial_delay_sec setting for liveness checks in your app.yaml file. If you set up its value to give enough time for your code to initialize, the first request coming to that instance will be processed quickly by your already-initialized code.
The doc of the standard environment mentions that max_concurrent_requests can be set. But this setting is not documented in the doc of the flex environment? I just deployed an app in the flex env with this option in the app.yaml and did not get an error. So can I assume it is also supported in the flex env?
And when doing websockets, with max_concurrent_requests=10, will having more than 10 simultaneous websocket connections result in an extra instance?
The short answer is unfortunately no, max_concurrent_requests in App Engine Flexible is not supported. In order for it to properly function, it has to be used along with target_throughput_utilization, however, you will get an error as soon as you specify it in the app.yaml of the application.
The way of controlling as to "when" the application will scale in App Engine Flexible is by specifying a target CPU utilization with cpu_utilization and target_utilization.
Google introduced target_concurrent_requests to App Engine Flexiable
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#automatic_scaling
In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}
I created an App Engine Flex NodeJS app, not realizing that there is no free tier. So I decided to switch to App Engine standard. I updated my app.yaml, deployed, and everything seems to be working. However, I deployed this a couple hours ago, and I still have 2 compute engine instances running. Is there something I need to do to shut those down, or did I just not wait long enough? I don't want them running at all because I don't want to pay for them, especially since the standard app doesn't use compute engine at all.
I tried going to the Compute Engine tab in GCP to see if I could do anything there, but all I get is a "Create instance" button.
Check the Versions page on the Google Cloud Console and make sure that the old version is deleted and that traffic is going to the new version. It may be a good idea to use a new version for the standard env.
Then on the Instances page check if the respective instances are running and shut them down if so.
I'm just going through the node.js tutorials with a free trial account, and i'm stuck on the second one where you add a db. I add the mongodb deployment, shows up as a VM instances, fine. And my first deploy worked, but now that i'm trying to edit stuff, my deploy's keep failing.
The error i get is that I've exceeded my CPU quota. Watching the list of VM Instances under Compute Engine, i see it keeps spawning up instances, even though the app isn't being used. Guessing it just spins up 8 instances by default?
But then i guess the build system needs its own VM's, but the CPU capacity is used up, so none available to do subsequent builds?! I feel like i'm missing something...
Also, i see i can explicitly start VM's myself, so what process is creating them form me? And can i turn it off? or set a cap on number of instances it spawns?
Can i tell my project to only use 4
Also, the deploy takes forever, is that normal? Following the tutorials, so far I've only seen this command to deploy:
gcloud preview app deploy app.yaml --set-default
Is there another command that does an incremental deploy or something?
By using gcloud preview app deploy you're actually using Managed VMs which is an App Engine runtime which in turn runs Docker containers on Google Compute Engine, which it creates on its own. In other words, you're not using Google Compute Engine directly.
To get rid of extra VMs, you need to delete old app versions: navigate to Compute > App Engine > Versions and delete the versions you don't want.
See also this answer for more details and suggestions.