GAE behavior when relocating an application to another server - google-app-engine

Two questions:
Does Google App Engine send any kind of message to an application just before relocating it to another server?
If so, what is that message?

No it doesnt. It doesnt relocate either, old instances keep running (and eventually stop when idle for long enough) while new ones are spawned.

There are times when App Engine needs to move your instance to a different machine to improve load distribution.
When App Engine needs to turn down a manual scaling instance it first
notifies the instance. There are two ways to receive this
notification. First, the is_shutting_down() method from
google.appengine.api.runtime begins returning true. Second, if you
have registered a shutdown hook, it will be called. It's a good idea
to register a shutdown hook in your start request. After the
notification is issued, existing requests are given 30 seconds to
complete, and new requests immediately return 404.
If an instance is
handling a request, App Engine pauses the request and runs the
shutdown hook. If there is no active request, App Engine sends an
/_ah/stop request, which runs the shutdown hook. The /_ah/stop request
bypasses normal handling logic and cannot be handled by user code; its
sole purpose is to invoke the shutdown hook. If you raise an exception
in your shutdown hook while handling another request, it will bubble
up into the request, where you can catch it.
The following code sample demonstrates a basic shutdown hook:
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import runtime
def my_shutdown_hook():
apiproxy_stub_map.apiproxy.CancelApiCalls()
save_state()
# May want to raise an exception
runtime.set_shutdown_hook(my_shutdown_hook)
Alternatively, the following sample demonstrates how to use the is_shutting_down() method:
while more_work_to_do and not runtime.is_shutting_down():
do_some_work()
save_state()
More details here: https://developers.google.com/appengine/docs/python/modules/#Python_Instance_states

Related

Is catch-all handler pointing to "auto" a bad idea?

My instance has little to no traffic but I have a min-idle instance set to 1. What I notice is that whenever there is a random url (via some bot) that doesn't exist is accessed, it is considered a dynamic request since my catch all handler is auto. This is fine, except I see these 404 errors (404 because there are no http handlers associated with these url patterns even though the yaml defines a catch all pattern) resulting in instance restarts. Why should the instance restart if it runs into 404 errors?
I have all my dynamic handlers follow "/api" pattern and then a few that don't. So, I can explicitly list all valid patterns and map them to the auto handler. Would that then consider these random links as static but not present and throw 404 error (which I am fine with)? I want to make sure the instance doesn't keep running just because of some rouge requests.
I just did a local experiment (I don't presently have any quickly deployable play app) and it looks like your quite interesting idea could work.
I replaced the .* pattern previously catching all stragglers and routing them to my default service script (I'm using the python runtime) with specific patterns, then added this handler after all others:
- url: /(.*)$
static_files: images/\1
upload: images/.*
My images directory is real, holding static images (but for which I already have another handler with a more specific pattern).
With this in place I made a request to /crap and got, as expected (there is no images/crap file):
INFO 2019-11-08 03:06:02,463 module.py:861] default: "GET /crap
HTTP/1.1" 404 -
I added logging calls in my script handler's get() and dispatch() calls to confirm they're not actually getting invoked (the development server request logging casts a bit of doubt).
I also checked on an already deployed GAE app that requesting an image that matches a static handler pattern but which doesn't actually exist gets the 404 answer without causing a service's instance to be started (no instance was running at the time), i.e. it comes directly from the GAE's static content CDN.
So I think it's well worth a try with the go runtime, this could save some significant instance time for an app without a lot of activity faced with random bot traffic.
As for the instance restarts, I suspect what you see is just a symptom of your min-idle instance set to 1. Unlike a dynamic instance the idle (aka resident) instance is not normally meant to handle traffic, it's just ready to do it if/when needed. Only when there is no dynamic instance running (and able to handle incoming traffic efficiently) and a new request comes in that request is immediately routed to the idle instance. At that moment:
the idle instance becomes a dynamic one and will continue to serve traffic until it shuts due to inactivity or dies
a fresh idle instance is started to meet the min-idle configuration, it will remain idle until another similar event occurs
Note: your idea will help with the instance hours portion used by the dynamic instances, but not with the idle instance portion.
According to the documentation which quotes the following:
"When an instance responds to the request /_ah/startwith an HTTP status code of 200–299 or 404, it is considered to have started correctly and that it can handle additional requests. Otherwise, App Engine cancels the instance. Instances with manual scale adjustment restart immediately, while instances with basic scale adjustment restart only when necessary to deliver traffic."
You can find more detail about how instances are managed for Standard App Engine environment for Go 1.12 on the link: https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed
As well, I recommend you to read the document "How instances are managed", on which quotes the following:
"Secondary routing
If a request matches the part [YOUR_PROJECT_ID].appspot.comof the host name, but includes the name of a service, version, or instance that does not exist, the service is routed default. Secondary routing does not apply to custom domains; requests sent to these domains will show an HTTP status code 404if the hostname is not valid."
https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed

Init and destroy function

I am still beginner with golang in Google Cloud Appengine (standard).
I want to use a function that is automatically call for the instance shutting down.
There is a function init called during startup.
Now I am looking for the opposite part like a destroy function.
Seems there is something like that for python, but could not find
anything for golang.
How could you realise such a destroy fuction in google appengine instances ?
This is documented at Go - How Instances are Managed.
Unfortunately the Go doc is incomplete, here's the link to the Pyton version: Python - How Instances are Managed. The way it is implemented / supported is language-agnostic.
When an instance is spin up, an HTTP GET request is sent to the /_ah/start path.
Before an instance is taken down, an HTTP GET request is sent to the /_ah/stop path.
You should use package init() functions for initialization purposes as that always runs, and only once. If a request is required for your init functions, then register a handler to the _/ah/start path.
And you may register a handler to /_ah/stop and implement "shutdown" functionality like this:
func init() {
http.HandleFunc("/_ah/stop", shutdownHandler)
}
func shutdownHandler(w http.ResponseWriter, r *http.Request) {
doSomeWork()
saveState()
}
But you can't rely on this 100%:
Note: It's important to recognize that the shutdown hook is not always able to run before an instance terminates. In rare cases, an outage can occur that prevents App Engine from providing 30 seconds of shutdown time. Thus, we recommend periodically checkpointing the state of your instance and using it primarily as an in-memory cache rather than a reliable data store.

How to warm up app engine endpoint

I have appengine endpoint and trying to reduce latency on few first calls to newly created endpoint instance. Application is written in Java and endpoints are auto scaled.
To address this issue I configured idle instance, although even if instance is already created, first few calls routed to it consume some extra time. Following documentation I've implemented the custom servlet handling warm up requests and marked the EndpointsServlet as load on startup.
Inside the warm up servlet I've put code that initiates some commonly used services, load some data etc. Effect was almost impossible to notice.
After it I have implemented calls to methods exposed by the endpoint like that:
call("/_ah/api/teamly/v1/test/dummy")
It works for some cases (even most of them) and after calling few key methods instance is really ready to serve. The problem I'm facing now is that if I'm using auto scaling for some module I can't route the request to specific instance.
So the question is:
How should I properly warm up the endpoint instance to avoid load requests initiated from frontend.
You need to put a listener to /_ah/warmup and then make calls to any resources you want it to be warmed up. You can find detailed information at:
Configuring Warmup Requests to Improve Performance

Programatically listing and sending requests to dynamic App Engine instances

I want to send a particular HTTP request (or otherwise communicate a message) to every (dynamic/autoscaled) instance which is currently running for a particular App Engine application.
My goal is to trigger each instance to discard some locally cached data (because I have just modified the underlying data and want them to reload it).
One possible solution is to store a value in Memcache, and have instances check this each time they handle a request to see if they should flush their cache. But this adds latency to every request.
Another possible solution would be to somehow stop all running instances. No fixed overhead, but some impact while instances are restarted.
An even less desirable solution would be to redeploy the application code in order to cause all instances to be stopped. This now adds additional delay on my end as a deployment takes some time.
You could use the management API to list instances for a given version, but I'd suggest that you'd probably want to use something like the PubSub API to create a subscription on each of your App Engine instances. Since each instance has its own subscription, any messages sent to the monitored queue will be received by all instances.
You can create the subscription at startup (the /_ah/start endpoint may be useful), and then delete it at shutdown (using the /_ah/stop endpoint).

How can Google App Engine be prevented from immediately rescheduling tasks after status code 500?

I have a Google App Engine servlet that is cron configured to run once a week. Since it will take more than 1 minute of execution time, it launches a task (i.e. another servlet task/clear) on the application's default push task queue.
Now what I'm observing is this: if the task causes an exception (e.g. NullPointerException inside its second servlet), this gets translated into HTTP status 500 (i.e. HttpURLConnection.HTTP_INTERNAL_ERROR) and Google App Engine apparently reacts by immediately relaunching the same task again. It announces this by printing:
Web hook at http://127.0.0.1:8888/task/clear returned status code 500. Rescheduling..
I can see how this can sometimes be a feature, but in my scenario it's inappropriate. Can I request that Google App Engine should not do such automatic rescheduling, or am I expected to use other status codes to indicate error conditions that would not cause rescheduling by its rules? Or is this something that happens only on the dev. server?
BTW, I am currently also running other tasks (with different frequencies) on the same task queue, so throttling reschedules on the level of task queue configuration would be inconvenient (so I hope there is another/better option too.)
As per https://developers.google.com/appengine/docs/java/taskqueue/overview-push#Java_Task_execution - the task must return a response code between 200 and 299.
You can either return the correct value, set the taskRetryLimit in RetryOptions or check the header X-AppEngine-TaskExecutionCount when task launches to check how many times it has been launched and act accordingly.
I think I've found a solution: in the Java API, there is a method RetryOptions#taskRetryLimit, which serves my case.

Resources