How can I warm up an AppEngine Flex app after a deployment? - google-app-engine

It appears that AppEngine standard has a warmup feature to warm up an app after a deployment but I don't see the same feature available for Flex. The readiness & liveness probes also don't work for this since setting the path setting to a custom path inside the application doesn't seem to make the probes actually hit the internal endpoint.
Is there some solution I'm missing other than doing things like manually hitting the endpoints myself after the deployment which won't be very reliable since the calls don't necessarily always round robin to each instance?

In App Engine Standard, warmup requests essentially load your app's code into a new instance before any live requests reach that instance. This can happen in the following situations:
When you redeploy a version of your app.
When new instances are created due to the load from requests
exceeding the capacity of the current set of running instances.
When maintenance and repairs of the underlying infrastructure or
physical hardware occur
In App Engine Flexible, you can achieve the same result by using the initial_delay_sec setting for liveness checks in your app.yaml file. If you set up its value to give enough time for your code to initialize, the first request coming to that instance will be processed quickly by your already-initialized code.

Related

Best configuration for Automatic Scaling in Google App Engine to always have an instance available?

What is the best way to set up Google App Engine to always have at least one instance ready and available to handle requests when using automatic scaling? This is for a low traffic application.
There are settings here that allow you to control them but I am not sure what the best combination is and some of them sound confusing. For example, min-instances and min-idle-instances sound similar. I tried setting min-instances to 1 but I still experienced lag.
What is a configuration that from the end user's point of view is always on and handles requests without lag (for a low traffic application)? I
In the App Engine Standard environment, when your application load or handling a requests this may cause the users to experience more latency, however warmup requests might help you reduce this latency. Before any live requests get to that instance, warmup requests load the app's code onto a new one. If this is enabled, App Engine will detect if your application needs a new instance and initiate a warmup request to initialize a new instance. You can check this link for Configuring Warmup Requests to Improve Performance
Regarding min-instances and min-idle-instances these will only apply when warmup request is enabled. As you can see in this post the difference of these two elements: min-instances used to process the incoming request immediately while min-idle-instances used to process high load traffic.
However, you mentioned that you don't need a warmup so we suggest you to select App Engine Flexible and based on this documentation it must have at least one instance running and can scale up in response to traffic. Please take note that using this environment costs you a higher price. You can refer to this link for reference regarding the pricing of two environments in App Engine.

Slow initial connection to Google Cloud App Engine

Initial connections to my website are extremely slow (100+ seconds). How can I diagnose the issue?
Using the Chrome dev tool network tab, I see that the issue is "initial connection" and not things like SSL or Waiting/TTFB.
This only happens for the first page visit to the website for a given device; after the first page loads, everything on the website is very fast. This consistently happens for new devices, on the same device if I don't visit the website for X days, and on the same device if I clear the cache and browsing history.
The website is a Django app is hosted using Google Cloud App Engine with 2 flexible instances.
User traffic to the website is low, so I doubt the issue is load balancing or traffic spikes.
Thanks!
Yesterday I tried opening the page and I noticed 1.8min to load the main page and 2.1min to make a search, later attempts were faster as you mentioned. I also tried accessing the page today and it loaded quite fast.
From my understanding the high latency of the first connection might be related to session handling, database connections, ssl certificates problem, huge amount of uncached data, expensive operations run before the server sends the response. It's nearly impossible for us to determine it without access to your code, logs and database configurations.
As for how to narrow down the issue I might suggest the following:
Examine your logs for possible causes.
Add timeit logging interleaved with each of the statements that handle the requests and watch for bottlenecks or long-running code.
Deploy the same endpoint without logos, images and other data that would be browser-cached and see if it makes a difference.
Create a hello-world simple endpoint and check it's latency. Keep slowly evolving the endpoint to resemble your actual handling code with hopes on finding what's the issue.
If only the first connection is slow, it might be because the instance is starting and you do not have minimum idle instances and warmup requests enabled. This configuration will make you have instances ready for taking traffic and the latency will be slower in the first connection.
As it is stated in the documentation:
If you set a minimum number of idle instances, pending latency will
have less effect on your application's performance. Because App Engine
keeps idle instances in reserve, it is unlikely that requests will
enter the pending queue except in exceptionally high load spikes. You
will need to test your application and expected traffic volume to
determine the ideal number of instances to keep in reserve.
Also you can find more information about warmup requests in this documentation about Configuring Warmup Requests to Improve Performance
I resolved this issue by removing and recreating custom domain settings for my App Engine project, and removing and recreating the corresponding DNS records in domains.google, following these instructions:
https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
I'm still not sure what the underlying issue was, but this fixed it. Hope this can help anyone encountering a similar issue.
I had this exact issue. it was killing our load performance once we switched to a load balancer.
What is ended up being was the instance group port setting. We're obviously using ssl certs for the site but I had indicated port 80 and 443.
Once I removed port 80 from the instance group that the load balancer refers to, it loaded all the pages immediately.

How can I prevent a non-promoted instance from consuming a message

Whenever I deploy a new version in Google App Engine, and I transfer traffic to it, the previous version still consumes messages from our message broker. How can I make sure only the newly deployed version will consume messages without shutting down the old instances?
If you have multiple versions deployed while traffic is being migrated, you can check the current version using the Modules API and compare that with the default version.
Your check might look something like this:
from google.appengine.api import modules
def default_version = modules.get_default_version()
def instance_version = modules.get_current_version_name()
# you may additionally want to query the instances of the default version
# to make sure they've booted up and are actively serving traffic.
if default_version != instance_version:
# don't consume messages
In the code example above, the default version is the version traffic is being migrated to, and the current version is the version of the instance.
See also Using the Modules API.
Note: Services were formerly known as modules and the API methods still reflect that naming.
During the deployment of your service, you can use --promote and --stop-previous-version options.
https://cloud.google.com/sdk/gcloud/reference/app/deploy
However, generally it's best to gradually migrate the traffic. This is the case for both users and backend services. Since you can't deploy two services at the exact same time, imagine your GAE deployment is delayed by a few seconds. Do you expect the messages to be consumed by your running service that is getting replaced? So, it shouldn't matter if a few messages still got routed to the old instance while the traffic is migrating. That would be the right design.

GAE shutdown or restart all the active instances of a service/app

In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}

Is there any way to access Firebase from an App Engine servlet without using manual scaling or the flexible environment?

Question:
Is there any way to access Firebase using server-side code without using either manual scaling or the flexible environment?
Context:
I want to achieve the following flow:
app posts pending 'updates' to firebase -> backend picks them up -> backend sends emails -> backend modifies firebase 'updates' to non-pending state
From what I can see, if I want the back end to pick these updates up in real time, I need a long-running thread in the App Engine Flexible Environment. I'm prepared to forego this to avoid the flexible environment's pricing model and beta status.
Given that my choice is therefore the App Engine Standard Environment, it appears that to access Firebase i'm stuck with having to enable manual scaling.
It seems madness to have a resident instance running all the time - when there's no requirement to listen to updates in real time - which sits idle for 95% of the time then isn't available outside of a 9-instance-hour (free) contiguous period.
Can Firebase be somehow accessed from a server-side without 'attaching a listener', such that I can call it from an automatically scaled instance and simply get a snapshot? If not, is there an alternate technical or architectural solution here - or something I'm missing?
This must be a fairly common issue!
Thankyou for your time, much appreciated.

Resources