Google App Engine autohealing affected by Quota usage - google-app-engine

I have a Google App Engine Quota of 3000 instances and with 98% usage. Hence the auto healing is being affected. As in the unhealthy instances are not autohealed. (I have a use case where each VM instance comes up hosting the App does some tasks and then marks itself unhealthy, expectation is that App Engine auto heals this and starts New VM with a clean state)
Based on my understanding of autohealing the process is:
New healthy VM is brought up and my app is started on this
Then the unhealthy instance is brought down/killed
Problem: As there is no buffer for the new instance to come up as my usage is 98%, this ends up making many of my VM instances stay in unhealthy state.
Possible Solution: Hence if I increase my instance quota to 3500 and set max_num_instances to 3000 in configuration (leaving 500 buffer for autohealing)will this solve the problem to some extent?

Related

App Engine can't update "The requested amount of instances has exceeded GCE's default quota" new version is deleted

I have this annoying error with App Engine; basically our platforms usualy have 2-3 services running on them using Docker container with Flex 'custom' set.
Flex 'custom' uses 2 instances by default. The default quota limit is 4.
When we deploy 2 services we reach the limit; once we've reached the limit we have no way of updating the old versions of the app; we get rejected with "The requested amount of instances has exceeded GCE's default quota" even for updating the exsiting service.
Also the new version gets deleted from App Engine 'versions' so right now we've locked out; we can't update the app, we can't delete it and we can't even stop it as it's the default we're trying to update.
Any ideas?

Which Google Cloud product to use to execute a very long process Google Cloud VideoIntelligence Analysis

I have been using Google Cloud Video Intelligence annotation functionality with Google App Engine Flex. When I try to use VideoIntelligence with a two hour video, it takes 60 minutes for the AnnotateVideo function to respond.
gs_video_path ='gs://'+bucket_name+'/'+videodata.video.path+videodata.video.name
print(gs_video_path)
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]
operation = video_client.annotate_video(gs_video_path, features=features)
Currently, the only place I can execute this is on Google App Engine Flex. Yet, Google App Engine Flex keeps an instance idle all the time, it is very similar to running a VM in terms of cost.
Google App Engine has a 540 seconds timeout and same wise Google Cloud Run has a timeout of 900 seconds and Google Cloud Functions has a max timeout of 600 seconds as far as I understand.
In these circumstances, which Google Cloud product should I use for a one hour process that should take place while avoiding having an idle instance when there is no usage.
(Please don't respond quoting GKE or other VM based solutions, no idle instance solutions are accepted)
Cloud Run’s 900 second timeout is subject to change soon to satisfy your needs (up to an hour). There's a feature in the works. I’ll update here once it's available in beta, stay tuned.
#ahmetb-todo
You can specify output_uri in the original request. This will write the final result to your GCS bucket. Then you don't have to wait for the long running operation to finish on your VM. The initial request will only take a few seconds so you can use Google Cloud Function.
When the operation finishes an hour later, you process the output json files by setting up a trigger on your output GCS bucket.
I don't think Google has a service that matches your needs.
Probably you should implement some custom workflow, like:
From "short living" environment, like Function, CloudRun or AppEngine do following:
Put an event for your long-running task to PubSub
Use Compute Engine API to start a VM
When VM starts, it's startup script should get latest item from the PubSub and start your long-running task
When task is done, VM terminates itself using ComputeEngine API or calls a Function that invokes shutdown

How to programmatically scale up app engine?

I have an application which uses app engine auto scaling. It usually runs 0 instances, except if some authorised users use it.
This application need to run automated voice calls as fast as possible on thousands of people with keypad interactions (no, it's not spam, it's redcall!).
Programmatically speaking, we ask Twilio to initialise calls through its Voice API 5 times/sec and it basically works through webhooks, at least 2, but most of the time 4 hits per call. So GAE need to scale up very quickly and some requests get lost (which is just a hang up on the user side) at the beginning of the trigger, when only one instance is ready.
I would like to know if it is possible to programmatically scale up App Engine (through an API?) before running such triggers in order to be ready when the storm will blast?
I think you may want to give warmup requests a try. As they load your app's code into a new instance before any live requests reach that instance. Thus reducing the time it takes to answer while you GAE instance has scaled down to zero.
The link I have shared with you, includes the PHP7 runtime as I see you are familiar with it.
I would also like to agree with John Hanley, since finding a sweet spot on how many idle instances you have available, would also help the performance of your app.
Finally, the solution was to delegate sending the communication through Cloud Tasks:
https://cloud.google.com/tasks/docs/creating-appengine-tasks
https://github.com/redcall-io/app/blob/master/symfony/src/Communication/Processor/QueueProcessor.php
Tasks can try again hitting the app engine in case of errors, and make the app engine pop new instances when the surge comes.

Can GAE be configured to launch a new instance per request?

I'm building a data processing application, where I want an incoming (REST) request to cause a cloud instance to be started, do some processing, then retrieve the results. Typically, it would be something like this:
receive request
start instance
send request to instance
instance processes (~100% load on all instance CPUs)
poll service running on instance for status
fetch results from instance
shut down instance
I was planning on doing the instance management manually using something like jclouds, but am wondering if GAE could be configured to do something like this (saving me work).
If I have my processing service set up in GAE, can I make it so that a new instance is launched for every incoming request (or whenever the current instance(s) are at 100% CPU usage)?
Referring to instance management only (i.e. 1-4 and 7)...
From Scaling dynamic instances:
The App Engine scheduler decides whether to serve each new request
with an existing instance (either one that is idle or accepts
concurrent requests), put the request in a pending request queue, or
start a new instance for that request. The decision takes into account
the number of available instances, how quickly your application has
been serving requests (its latency), and how long it takes to spin up
a new instance.
Each instance has its own queue for incoming requests. App Engine
monitors the number of requests waiting in each instance's queue. If
App Engine detects that queues for an application are getting too long
due to increased load, it automatically creates a new instance of the
application to handle that load.
App Engine also scales instances in reverse when request volumes
decrease. This scaling helps ensure that all of your application's
current instances are being used to optimal efficiency and cost
effectiveness.
So in the scaling configuration I'd keep automatic_scaling (which is the default) and play with:
max_pending_latency:
The maximum amount of time that App Engine should allow a request
to wait in the pending queue before starting a new instance to handle
it. The default value is "30ms".
A low maximum means App Engine will start new instances sooner for pending requests, improving performance but raising running costs.
A high maximum means users might wait longer for their requests to be served (if there are pending requests and no idle instances to
serve them), but your application will cost less to run.
min_pending_latency:
The minimum amount of time that App Engine should allow a request to
wait in the pending queue before starting a new instance to handle it.
A low minimum means requests must spend less time in the pending queue when all existing instances are active. This improves
performance but increases the cost of running your application.
A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases
the time users must wait for their requests to be served.
See also in Change auto scaling performance settings:
Min Pending Latency - Raising Min Pending Latency instructs App Engine’s scheduler to not start a new instance unless a request
has been pending for more than the specified time. If all instances
are busy, user-facing requests may have to wait in the pending queue
until this threshold is reached. Setting a high value for this setting
will require fewer instances to be started, but may result in high
user-visible latency during increased load.
You may also want to take a look at Warmup requests, in case you want to reduce the latency for requests which would cause a new instance to be started.

GAE: What's the difference between <min-pending-latency> and <max-pending-latency>?

As far as I can read the docs, both settings do the same thing: start a new instance when a request has spent in pending queue longer than that setting says.
<max-pending-latency> The maximum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it. Default: "30ms".
A low maximum means App Engine will start new instances sooner for pending requests, improving performance but raising running costs.
A high maximum means users might wait longer for their requests to be served, if there are pending requests and no idle instances to serve them, but your application will cost less to run.
<min-pending-latency>
The minimum amount of time that App Engine should allow a request to wait in the pending queue before starting a new instance to handle it.
A low minimum means requests must spend less time in the pending queue when all existing instances are active. This improves performance but increases the cost of running your application.
A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases the time users must wait for their requests to be served.
Source: https://cloud.google.com/appengine/docs/java/config/appref
What's the difference between min and max then?
The piece of information you might be missing to understand these settings is that App Engine can choose to create an instance at any time between min-pending-latency and max-pending-latency.
This means an instance will never be created to serve a pending request before min-pending-latency and will always be created once max-pending-latency has been reached.
I believe the best way to understand is to look at the the timeline of events when a request enters the pending queue:
A request reaches the application but no instance are available to serve it so it is placed in the pending requests queue.
Until the min-pending-latency is reached: App Engine tries to find an available instance to serve the request and will not create a new instance. If a request is served below this threshold, it is a signal for App Engine to scale down.
After the min-pending-latency is reached and until max-pending-latency is reached: App Engine tries to find an available instance to serve the request.
After the max-pending-latency is reached: App Engine stops searching for an available instance to serve the request and creates a new instance.
Source: app.yaml automatic_scaling element

Resources