Initializing Go AppEngine app with Cloud Datastore - google-app-engine

in init() function for GAE's golang app, how can I set-up initial values for my application ?
How to read from Cloud Datastore in the init() function or immediately after applications start-up ? As I understand, server cannot write to the local filesystem and Cloud Datastore is the only option ?
I need some global variables and slices of data..

Using static files
On AppEngine you don't have access to the file system of the host operating system, but you can access files of your web application (you have read-only permission, you can't change them and you can't create new files in the app's folder).
So the question is: can your application's code change the data that you want to read and use for initialization? Or is it fine if it is deployed "statically" with your app's code?
If you don't need to change it (or only when you redeploy your app), easiest is to store it as a "static" file as part of your webapp. You may refer to files of your app using relative paths, where the current or working directory is your app's root. E.g. if your app contains a data folder in its root (where app.yaml resides), and there is an init_values.txt file inside the data folder, you can refer to it with the path: data/init_values.txt.
One important note: not every file is readable by code, this depends on the app configuration. Quoting from Configuring with app.yaml / Static file handlers:
If you have data files that need to be read by the application code, the data files must be application files, and must not be matched by a static file pattern.
Using the Datastore
You can't use AppEngine services that require a Context outside of handlers (because the creation of a Context requires an *http.Request value). This by nature means you can't use them in package init() functions either.
Note that you can use them from cron jobs and tasks added to task queues, because tasks and cron jobs are executed by issuing HTTP GET requests.
You have to restructure your code so that your initialization (e.g. reading from the Datastore) gets called from a handler.
Example of achieving this with Once.Do():
var once = sync.Once{}
func MainHandler(w http.ResponseWriter, r *http.Request) {
ctx := appengine.NewContext(r)
once.Do(func() { mysetup(ctx) })
// do your regular stuff here
}
func mysetup(ctx appengine.Context) {
// This function is executed only once.
// Read from Datastore and initialize your vars here.
}
"Utilizing" warmup requests
Yes, this may cause first requests to take considerably longer to serve. For this purpose (to avoid this) I recommend you to utilize Warmup requests. A warmup request is issued to a new instance before it goes "live", before it starts serving user requests. In your app.yaml config file you can enable warmup requests by adding -warmup to the inbound_services directive:
inbound_services:
- warmup
This will cause the App Engine infrastructure to first issue a GET request to /_ah/warmup. You can register a handler to this URL and perform initialization tasks. As with any other request, you will have an http.Request in the warmup handler.
But please note that:
..you may encounter loading requests, even if warmup requests are enabled in your app.
Which means that in rare cases it may happen a new instance will not receive a warmup request, so its best to check initialization state in user handlers too.
Related questions:
How do I store the private key of my server in google app engine?
Fetching a URL From the init() func in Go on AppEngine
Environment variables specified on app.yaml but it's not fetching on main.go

Related

Google app engine - how to disable the cache

So some context:
I have a nodeJS api running on a google app engine. All my get requests are being cached by default by the app engine for 10 minutes.
I am using cloudflare for my API as this allows me to remove specific items from the cache when needed.
You can imagine this has caused a bit of an issue because my CF cache was correctly cleared but the app engine kept returning old data.
According to the docs, you can set a default_expiration in the app.yaml file but setting this to 0 or 0s has made no difference and google keeps caching my responses.
Seemingly, there is also no way you can get something uncached from google.
Now my obvious question here is: is there some way I can completely ignore this cache? Preferrably without having to set my entire API's response to private , 0s cache.
It quite irks me that google is forcing this cache on me provides very vague documentation on the whole matter.
You can configure your app.yaml to define a cache period.
If you use default_expiration this will set a global default cache period for all static file handlers for an application. If omitted, the production server sets the expiration to 10 minutes by default.
To set specific expiration times for individual handlers, specify the expiration element within the handler element in your app.yaml file. You can change the duration of time web proxies and browsers should cache a static file served by this handler.
default_expiration: "4d 5h"
handlers:
- url: /stylesheets
static_dir: stylesheets
expiration: "0d 0h"
Seems like you are referring to the static cache (per your link). Try cache bursting techniques such as adding a query parameter to the url e.g.
https://<url>/?{{APP_VERSION_ID}}
where APP_VERSION_ID is the latest version of your deployed App. This way, each time you redeploy your App, the APP_VERSION_ID is changed and the latest version of your static files will always be loaded

Is catch-all handler pointing to "auto" a bad idea?

My instance has little to no traffic but I have a min-idle instance set to 1. What I notice is that whenever there is a random url (via some bot) that doesn't exist is accessed, it is considered a dynamic request since my catch all handler is auto. This is fine, except I see these 404 errors (404 because there are no http handlers associated with these url patterns even though the yaml defines a catch all pattern) resulting in instance restarts. Why should the instance restart if it runs into 404 errors?
I have all my dynamic handlers follow "/api" pattern and then a few that don't. So, I can explicitly list all valid patterns and map them to the auto handler. Would that then consider these random links as static but not present and throw 404 error (which I am fine with)? I want to make sure the instance doesn't keep running just because of some rouge requests.
I just did a local experiment (I don't presently have any quickly deployable play app) and it looks like your quite interesting idea could work.
I replaced the .* pattern previously catching all stragglers and routing them to my default service script (I'm using the python runtime) with specific patterns, then added this handler after all others:
- url: /(.*)$
static_files: images/\1
upload: images/.*
My images directory is real, holding static images (but for which I already have another handler with a more specific pattern).
With this in place I made a request to /crap and got, as expected (there is no images/crap file):
INFO 2019-11-08 03:06:02,463 module.py:861] default: "GET /crap
HTTP/1.1" 404 -
I added logging calls in my script handler's get() and dispatch() calls to confirm they're not actually getting invoked (the development server request logging casts a bit of doubt).
I also checked on an already deployed GAE app that requesting an image that matches a static handler pattern but which doesn't actually exist gets the 404 answer without causing a service's instance to be started (no instance was running at the time), i.e. it comes directly from the GAE's static content CDN.
So I think it's well worth a try with the go runtime, this could save some significant instance time for an app without a lot of activity faced with random bot traffic.
As for the instance restarts, I suspect what you see is just a symptom of your min-idle instance set to 1. Unlike a dynamic instance the idle (aka resident) instance is not normally meant to handle traffic, it's just ready to do it if/when needed. Only when there is no dynamic instance running (and able to handle incoming traffic efficiently) and a new request comes in that request is immediately routed to the idle instance. At that moment:
the idle instance becomes a dynamic one and will continue to serve traffic until it shuts due to inactivity or dies
a fresh idle instance is started to meet the min-idle configuration, it will remain idle until another similar event occurs
Note: your idea will help with the instance hours portion used by the dynamic instances, but not with the idle instance portion.
According to the documentation which quotes the following:
"When an instance responds to the request /_ah/startwith an HTTP status code of 200–299 or 404, it is considered to have started correctly and that it can handle additional requests. Otherwise, App Engine cancels the instance. Instances with manual scale adjustment restart immediately, while instances with basic scale adjustment restart only when necessary to deliver traffic."
You can find more detail about how instances are managed for Standard App Engine environment for Go 1.12 on the link: https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed
As well, I recommend you to read the document "How instances are managed", on which quotes the following:
"Secondary routing
If a request matches the part [YOUR_PROJECT_ID].appspot.comof the host name, but includes the name of a service, version, or instance that does not exist, the service is routed default. Secondary routing does not apply to custom domains; requests sent to these domains will show an HTTP status code 404if the hostname is not valid."
https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed

Init and destroy function

I am still beginner with golang in Google Cloud Appengine (standard).
I want to use a function that is automatically call for the instance shutting down.
There is a function init called during startup.
Now I am looking for the opposite part like a destroy function.
Seems there is something like that for python, but could not find
anything for golang.
How could you realise such a destroy fuction in google appengine instances ?
This is documented at Go - How Instances are Managed.
Unfortunately the Go doc is incomplete, here's the link to the Pyton version: Python - How Instances are Managed. The way it is implemented / supported is language-agnostic.
When an instance is spin up, an HTTP GET request is sent to the /_ah/start path.
Before an instance is taken down, an HTTP GET request is sent to the /_ah/stop path.
You should use package init() functions for initialization purposes as that always runs, and only once. If a request is required for your init functions, then register a handler to the _/ah/start path.
And you may register a handler to /_ah/stop and implement "shutdown" functionality like this:
func init() {
http.HandleFunc("/_ah/stop", shutdownHandler)
}
func shutdownHandler(w http.ResponseWriter, r *http.Request) {
doSomeWork()
saveState()
}
But you can't rely on this 100%:
Note: It's important to recognize that the shutdown hook is not always able to run before an instance terminates. In rare cases, an outage can occur that prevents App Engine from providing 30 seconds of shutdown time. Thus, we recommend periodically checkpointing the state of your instance and using it primarily as an in-memory cache rather than a reliable data store.

How to warm up app engine endpoint

I have appengine endpoint and trying to reduce latency on few first calls to newly created endpoint instance. Application is written in Java and endpoints are auto scaled.
To address this issue I configured idle instance, although even if instance is already created, first few calls routed to it consume some extra time. Following documentation I've implemented the custom servlet handling warm up requests and marked the EndpointsServlet as load on startup.
Inside the warm up servlet I've put code that initiates some commonly used services, load some data etc. Effect was almost impossible to notice.
After it I have implemented calls to methods exposed by the endpoint like that:
call("/_ah/api/teamly/v1/test/dummy")
It works for some cases (even most of them) and after calling few key methods instance is really ready to serve. The problem I'm facing now is that if I'm using auto scaling for some module I can't route the request to specific instance.
So the question is:
How should I properly warm up the endpoint instance to avoid load requests initiated from frontend.
You need to put a listener to /_ah/warmup and then make calls to any resources you want it to be warmed up. You can find detailed information at:
Configuring Warmup Requests to Improve Performance

Is it thread-safe to store data inside a static field when deploying on Google App Engine?

I was browsing through the code of Vosao CMS, an open source CMS hosted on Google App Engine (which I think is an awesome idea), and I stumbled upon the following code inside the CurrentUser class:
/**
* Current user session value cache class. Due to GAE single-threaded nature
* we can cache currently logged in user in static property. Which we set in
* authentication filter.
*/
public class CurrentUser {
private static UserEntity user;
public static UserEntity getInstance2() {
return user;
}
public static void setInstance2(UserEntity aUser) {
user = aUser;
}
}
I've never used GAE, but this sounds really weird to me.
Is GAE really "single threaded"? Is it safe to store request-scoped data inside a static field when using GAE?
Does this mean that, for each JVM instance, only one HTTP request will be executed at a time, while all the other requests are waiting?
Is this a common GAE idiom? If not, what would be the best GAE idiom to store such an UserEntity for the duration of a request? Shouldn't one use a ThreadLocal here, like they do in Spring Security? Or some kind of scoped bean (managed by the Dependency Injection container)?
Is GAE really "single threaded"? Is it safe to store request-scoped data inside a static field when using GAE?
It used to be that way (until 1.4.3) and it still is by default.
Now, you can specify that your app is threadsafe, and then you will receive concurrent requests to the same JVM/servlet.
Does this mean that, for each JVM instance, only one HTTP request will be executed at a time, while all the other requests are waiting?
For other request, you would probably get another JVM. But that is outside of your control. They can also just wait.
Currently, the Java and Python runtimes on App Engine are both single threaded; you're correct that this means only one HTTP request will be executed per JVM, but multiple JVMs will be started simultaneously to handle multiple incoming requests.
This could change at any time in the future, however - the Java Servlet spec permits multi-threading. As a result, you should definitely use a ThreadLocal.

Resources