How to have app engine avoid cold starts? - google-app-engine

Even when there are instances already running, I am still experiencing cold starts on some of the requests.
I thought that GAE would start some instances in the background and add them to the pool of active instances that serve requests only once the instances are started. Is that not the case? Is there a way to configure GAE to make it so?
Instead it seems like some of the requests are waiting the full duration of the new instance to be started, which can take up to 10 seconds, when using the existing instances only would have served all the benchmark traffic under a couple of seconds.
UPDATE:
This is my app.yaml config:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: '.*'
script: auto
automatic_scaling:
min_instances: 1
max_instances: 3

What you're looking for are the Warmup requests:
Warmup requests are a specific type of loading request that load
application code into an instance ahead of time, before any live
requests are made. Manual or basic scaling instances do not receive an
/_ah/warmup request.
And from Configuring warmup requests:
Loading your app's code to a new instance can result in loading
requests. Loading requests can result in increased request latency
for your users, but you can avoid this latency using warmup
requests. Warmup requests load your app's code into a new instance
before any live requests reach that instance.
Not 100% perfect - there are some limitations, but they're the next best thing.
Configuring warmup requests means:
Enabling warmup requests in your app.yaml file:
inbound_services:
- warmup
Creating your handler for the '/_ah/warmup' warmup requests URL

Related

App Engine urlfetch DeadlineExceededError

I have 2 service. One is hosted in Google App Engine and one is hosted in Cloud Run.
I use urlfetch (Python 2) imported from google.appengine.api in GAE to call APIs provided by the Cloud Run.
Occasionally there are a few (like <10 per week) DeadlineExceededError shown up like this:
Deadline exceeded while waiting for HTTP response from URL
But these few days such error suddenly occurs frequently (like ~40 per day). Not sure if it is due to Christmas peak hour or what.
I've checked Load Balancer logs of Cloud Run and turned out the request has never reached the Load Balancer.
Has anyone encountered similar issue before? Is anything wrong with GAE urlfetch?
I found a conversion which is similar but the suggestion was to handle the error...
Wonder what can I do to mitigate the issue. Many thanks.
Update 1
Checked again, found some requests from App Engine did show up in Cloud Run Load Balancer logs but the time is weird:
e.g.
Logs from GAE project
10:36:24.706 send request
10:36:29.648 deadline exceeded
Logs from Cloud Run project
10:36:35.742 reached load balancer
10:36:49.289 finished processing
Not sure why it took so long for the request to reach the Load Balancer...
Update 2
I am using GAE Standard located in US with the following settings:
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
max_pending_latency: 5s
inbound_services:
- warmup
- channel_presence
builtins:
- appstats: on
- remote_api: on
- deferred: on
...
The Cloud Run hosted API gateway I was trying to call is located in Asia. In front of it there is a Google Load Balancer whose type is HTTP(S) (classic).
Update 3
I wrote a simple script to directly call Cloud Run endpoint using axios (whose timeout is set to 5s) periodically. After a while some requests were timed out. I checked the logs in my Cloud Run project, 2 different phenomena were found:
For request A, pretty much like what I mentioned in Update 1, logs were found for both Load Balancer and Cloud Run revision.
Time of CR revision log - Time of LB log > 5s so I think this is an acceptable time out.
But for request B, no logs were found at all.
So I guess the problem is not about urlfetch nor GAE?
Deadline exceeded while waiting for HTTP response from URL is actually a DeadlineExceededError. The URL was not fetched because the deadline was exceeded. This can occur with either the client-supplied deadline (which you would need to change), or the system default if the client does not supply a deadline parameter.
When you are making a HTTP request, App Engine maps this request to URLFetch. URLFetch has its own deadline that is configurable. See the URLFetch documentation.
You can set a deadline for each URLFetch request. By default, the deadline for a fetch is 5 seconds. You can change this default by:
Including the following appengine.api.urlfetch.defaultDeadline setting in your appengine-web.xml configuration file. Specify the timeout in seconds:
<system-properties>:
<property name="appengine.api.urlfetch.defaultDeadline" value="10"/>
</system-properties>
You can also adjust the default deadline by using the urlfetch.set_default_fetch_deadline() function. This function stores the new default deadline on a thread-local variable, so it must be set for each request, for example, in a custom middleware.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
If your Cloud Run service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error.
Update the timeoutSeconds attribute in YAML file as :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: SERVICE
spec:
template:
spec:
containers:
- image: IMAGE
timeoutSeconds: VALUE
OR
You can update the request timeout for a given revision at any time by using the following command:
gcloud run services update [SERVICE] --timeout=[TIMEOUT]
If requests are terminating earlier with error code 503, you might need to update the request timeout setting for your language framework:
Node.js developers might need to update the [server.timeout property via server.setTimeout][6] (use server.setTimeout(0) to achieve an unlimited timeout) depending on the version you are using.
Python developers need to update Gunicorn's default timeout.

Google App Engine throws error on Basic Scaling

I'm using golang & Google App Engine for the project. I had a task where I received a huge file, split it into lines and sent these lines one by one to a queue to be resolved. My initial setting for scaling inside the app.yaml file was the following:
instance_class: F1
automatic_scaling:
min_instances: 0
max_instances: 4
min_idle_instances: 0
max_idle_instances: 1
target_cpu_utilization: 0.8
min_pending_latency: 15s
It was working alright, but it had an issue - since there were really a lot of tasks, 10 minutes later it would fail (as per documentation, of course). So I decided to use B1 instance class instead of F1 - and this is where the things went wrong.
My setup for B1 looks like this:
instance_class: B1
basic_scaling:
max_instances: 4
Now, I've created a very simple demo to demonstrate the idea:
r.GET("foo", func(c *gin.Context) {
_, err := tm.CreateTask(&tasks.TaskOptions{
QueueID: "bar",
Method: "method",
PostBody: "foooo",
})
if err != nil {
lg.LogErrorAndChill("failed, %v", err)
}
})
r.POST("bar/method", func(c *gin.Context) {
data, err := c.GetRawData()
if err != nil {
lg.LogErrorAndPanic("failed", err)
}
fmt.Printf("data is %v \n", string(data))
})
To explain the logic behind it: I send a request to "foo" which creates a task which is added to the queue with some body text. Inside the task a post method is being called based on the queueId and method parameters, which receives some text and in this simple example just logs it out.
Now, when I run the request, I get the 500 error, which looks like this:
[GIN] 2021/10/05 - 19:38:29 | 500 | 301.289µs | 0.1.0.3 | GET "/_ah/start"
And in the logs I can see:
Process terminated because it failed to respond to the start request with an HTTP status code of 200-299 or 404.
And inside the queue in the task (reason to retry):
INTERNAL(13): Instance Unavailable. HTTP status code 500
Now, I've read the documentation and I know about the following:
Manual, basic, and automatically scaling instances startup differently. When you start a manual scaling instance, App Engine immediately sends a /_ah/start request to each instance. When you start an instance of a basic scaling service, App Engine allows it to accept traffic, but the /_ah/start request is not sent to an instance until it receives its first user request. Multiple basic scaling instances are only started as necessary, in order to handle increased traffic. Automatically scaling instances do not receive any /_ah/start request.
When an instance responds to the /_ah/start request with an HTTP status code of 200–299 or 404, it is considered to have successfully started and can handle additional requests. Otherwise, App Engine terminates the instance. Manual scaling instances are restarted immediately, while basic scaling instances are restarted only when needed for serving traffic
But it is not really helpful - I don't understand why the /_ah/start request does not respond properly and I am not really sure how to debug it or how to fix it, especially since the F1 instance was working ok.
Request to the url /_ah/start/ are routed to your app, and your app apparently is not ready to handle it, which leads to the 500 response. Check your logs.
Basically your app needs to be ready to incoming requests with url /_ah/start/ (similar way as it is ready to handle requests to url /foo/). If you run the app locally, try to open such url (via curl etc) and see what will be the response. It needs to respond with a response code 200–299 or 404 (as mentioned in the text you quoted), otherwise it wont be considered as a successfully started instance.

App Engine - is it mandatory to use warmup request to use min_instances?

I currently use app engine standard environment with django. I want to have automatic scaling and always have at least one instance running.
Consulting the documentation it says that to use min_instances it is recommended to have warm up requests enabled.
My question is: is this mandatory? Is there no way to always have an active instance without using warm up requests?
This is probably more of a question for Google engineers. But, I think that they are required. The docs don't say "recommended"; They say "must":
Imagine if your instances shut down because of a server reboot. The warmup request gets them running again. A start request would also do the trick, but after some delay. It could be that Google depends on sending warmup requests after reboot, and not start.
UPDATE
You just need a simple url handler that returns a 200 response. Could be something as simple as this in your app.yaml:
- url: /_ah/warmup # just serve simple, quick
static_files: static/img/favicon.ico
upload: static/img/favicon.ico
Or better, in your urls.py, point the url handler to a view like this:
(r'^_ah/warmup$', 'warmup'),
in views.py:
from django.http import HttpResponse
def warmup():
return HttpResponse('hello', content_type='text/plain')

Google app engine prevent OPTIONS request between two services

I've created a GAE project and I deployed two services:
default (https://myservice.appspot.com) for the front-end app
backend (https://backend-dot-myservice.appspot.com) for the backend (Node.js)
I've also added a custom domain so that the default service is reachable also at https://myservice.com.
The problem I have is that each AJAX requests performed by the browser is preceded by an OPTIONS request (to handle the CORS).
What's the best solution to avoid this OPTIONS request? It should be fixed if both front-end/backen-end are on the same host, but how can I do it on Google App Engine?
Thank you!
I solved adding a dispatch.yaml file on the default service
dispatch:
- url: "*/api/*"
service: backend
where backend is my backend service.
And I changed my backend in order to listen on addresses like /api/something.
So now the browser has origin https://myservice.com and the url of ajax requests to the beckend are like https://myservice.com/api/something.
Since now client and server have the same origin, the CORS settings is not needed anymore, and the OPTIONS request is not performed by the browser.
I don't know if it's the best solution, but for me it worked.
As it was mentioned in this Stackoverflow post:
OPTIONS requests are pre-flight requests in Cross-origin resource sharing (CORS).
This pre-flight request is made by some browsers as a safety measure to ensure that the request being done is trusted by the server. Meaning the server understands that the method, origin and headers being sent on the request are safe to act upon.
Your server should not ignore but handle these requests whenever you're attempting to do cross origin requests.
CORS Support for Google App Engine in your app.yaml:
One important use of this feature is to support cross-origin resource sharing (CORS), such as accessing files hosted by another App Engine app.
For example, you could have a game app mygame.appspot.com that accesses assets hosted by myassets.appspot.com. However, if mygame attempts to make a JavaScript XMLHttpRequest to myassets, it will not succeed unless the handler for myassets returns an Access-Control-Allow-Origin: response header containing the value http://mygame.appspot.com.
handlers:
- url: /images
static_dir: static/images
http_headers:
Access-Control-Allow-Origin: http://mygame.appspot.com
Note: if you wanted to allow everyone to access your assets, you could use the wildcard '*', instead of http://mygame.appspot.com.

dispatch.yaml in Google App Engine has increase the response time

Based on 100 requests.
Region: southamerica-east1
When executing a GET at xxx.appspot.com/api/v1/ping the average response time is +/- 50 ms.
Example: Load time: 83 ms
When activating dispach.yaml (gcloud app deploy dispatch.yaml) and executing the request with the new URL, xxx.mydomain.com/api/v1/ping, the average response time is 750 ms.
Example Load time: 589 ms
dispatch.yaml
dispatch:
- url: "*/api/*"
service: my-service
I'm using spring boot on the server. follow app.yaml
service: my-service
runtime: java
env: flex
threadsafe: true
runtime_config: # Optional
jdk: openjdk8
handlers:
- url: /api/*
script: this field is required, but ignored
manual_scaling:
instances: 1
resources:
cpu: 2
memory_gb: 2.3
How do I improve the response time?
Am I using the dispatch correctly to associate my requests with my domain?
curl -w "#curl-format.txt" -o ./ -s http://my.domnai.com/
time_namelookup: 0,253
time_connect: 0,328
time_appconnect: 0,000
time_pretransfer: 0,328
time_redirect: 0,000
time_starttransfer: 1,713
----------
time_total: 1,714
curl -w "#curl-format.txt" -o ./ -s http://my-app.appspot.com/
time_namelookup: 0,253
time_connect: 0,277
time_appconnect: 0,000
time_pretransfer: 0,277
time_redirect: 0,000
time_starttransfer: 0,554
----------
time_total: 0,554
Using a custom domain is rather orthogonal to using a dispatch file.
When App Engine receives a request it first needs to determine which application is the request destined to. By default it does that using exclusively the requests's domain name, be it appspot.com or a custom domain. From Requests and domains:
App Engine determines that an incoming request is intended for your
app by using the domain name of the request.
While making this decision it also determines a particular version of a service in the application to send the request to, based on the rules described in Routing via URL.
Requests using a custom domain might require some additional processing compared to using appspot.com (I'm unsure about this), which could explain some increase in the response time. This can be confirmed by measurements. But if so, I don't think there's anything you can do about it.
Note that a dispatch file is not required to make the above-mentioned routing decisions. Even if you use a custom domain. In fact there is no reference to the dispatch file anywhere in Adding a custom domain for your application. But if want to alter these decisions then you need to use a dispatch file.
The dispatch file allows to also take the request path into account (in addition to the request domain name) when making the routing decisions.
Using a dispatch file will increase the response time as the request domain and path must be sequentially compared against each and every rule in the dispatch file, until a match is found. If no match is found the request will be sent to the version of the app's default service configured to receive traffic. You can slightly reduce the processing time for particular services by placing their rules earlier in the dispatch file, but that's about all you can do.

Resources