How does Google App engines cold boot deal with global variables? - google-app-engine

My python API initializes a global variable which takes about 10 seconds to fully initialize before the server starts running. I'm wondering if when GAE initializes a new instance, this same initialization is required? or am I able to access the same variable across multiple instances?

This answer is just complementary to the other mentioned approaches, in most if not all cases they can be combined.
If you're in the standard environment you can take advantage of the warmup requests to well... warm (most of) your instances up before real traffic hits them.
Multithreading complexity doesn't really matter in such cases since you know that no other request can hit the instance until its init isn't complete - i.e. until it successfully responds to the warmup request. So you can optimize for this case while still playing it safe (even if not very efficient) for the rare cases when instances still start up cold and can get multiple requests in parallel.
Warmup requests aren't supported in the flexible environment, but:
To warm up your app, implement a health check handler that only
returns a ready status when the application is warmed up.
For example, you can create a readiness check that returns a ready
status after initializing the cache so your app won't receive traffic
until it is ready.

Each instance in the application is a separate interpreter, so globals need to be initialised per instance.
If initialisation is costly, but the computed value doesn't change frequently it may be worth storing the value in memcache, the datastore, a database or some other globally available store. Retrieval from memcache is fast, but persistence is not guaranteed, so you may need to re-run the initialisation from time to time. Retrieval from the datastore or a database is usually slower, but persistence is guaranteed in normal circumstances.
As dhauptman observes in the comments, this article contains some advice on lazy-loading global variables.

Related

Does App Engine Flexible for Python support concurrent requests?

From the documentation on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.
Let's say my application can process a single request every 60 seconds.
After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold.
I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold.
The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure.
Perhaps 'number of workers' is the config setting you're looking for:
https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration
Gunicorn uses workers to handle requests. By default, Gunicorn uses sync workers. This worker class is compatible with all web applications, but each worker can only handle one request at a time. By default, gunicorn only uses one of these workers. This can often cause your instances to be underutilized and increase latency in applications under high load.
And it sounds like you've already seen that you can specify the cpu utilization threshold:
https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#automatic_scaling
You can also use something other than gunicorn if you prefer. Here's one of their example's where they use Honcho instead:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/6-pubsub/app.yaml

Google App Engine for long running but low CPU tasks, or long-polling?

App Engine has been great for requests that process quickly with no external API calls to databases or caches or third-party resources, but we've found that introducing any sort of "longer running" component or external latency (for example in a HTTP POST operation that runs asynchronously in the background and might take a second or two to process a few more intense database queries... totally invisible and OK from a UX perspective on the client-side because it's asynchronous but expensive to App Engine billing since it's long running) ... the "instance hours" compound and drive costs up considerably.
These sorts of expense inducing situations where a request is literally just waiting for a response from an external resource and requiring almost zero CPU during their idling seem avoidable, but I'm not sure if it's avoidable with App Engine.
It's almost like a "long poll" where the response might be left open but doing nothing.
Is there a way to do this on App Engine without just paying an insane amount for instance hours, or would we be better off moving to Compute Engine or EC2? Does it scale automatically based on CPU load, or is it based solely on open and perhaps inactive requests in total count? — threadsafe is indeed enabled.
There are really two ways to go about this one (top of mind).
Use Task Queues!
If the work doesn't need to be exactly at the same time of the request, this is exactly what [task queues] in App Engine are for. They allow you to put a job on a queue, and have another module pick up the work. They're kind of great because you can separately scale your front end and back end processes.
If that doesn't work....
Use App Engine Flexible
Under the hood App Engine Flexible is just running GCE instances. The cost structure is entirely different, since you persistently have a VM running in the background serving your requests.
Hope this helps!
What you're really worried about here is how App Engine scales your instances. Because many of your requests require few resources, your app might be able to handle many more concurrent requests on a single instance than normal. You can look into parameters that shape scaling here. Of particular interest:
max_concurrent_requests The number of concurrent requests an automatic scaling instance can accept before the scheduler spawns a new instance (Default: 8, Maximum: 80).
There is a danger here, where an instance may fill up with non-long-polling requests and become overburdened. To prevent that, you could isolate your long-polling requests into their own service and set its scaling parameters separately from the rest of your app.

How to programmatically look up google app instances

I have implemented instance mem-caches because we have very static data and the memcache is not very reliable and rather slow compared to an instance cache.
However there is some situations where I would like to invalidate the instance caches. Is there any way to look them up?
Example
Admin A updates a large gamesheet on instance A and that instance looks up all other instances and update the data using a simple REST api.
TL;DR: you can't.
Unlike backends, frontend instances are not individually addressable; that is, there is no way for you to make a RESTy URLFetch call to a specific frontend instance. Even if they were, there is no builtin mechanism for enumerating frontend instances, so you would need to roll your own, e.g. keeping a list of live instances in the datastore and adding to it in a warmup request and removing on repeated connect failure. But at that point you've just implemented a slower, more costly, and less available memcache service.
If you moved all the cache services to backends (using your instance-local static, or, for instance, running a memcached written in Go as a different app version), it's true you would gain a degree of control (or at least transparency) regarding evictions. Availability, speed, and cost would still likely suffer.

What is a Google App Engine instance?

I am trying to estimate the monthly costs for having GAE for in-app store and I do not really understand what is an instance and what can I do within one instance.
Can I just have one instance with multiple threads to deal with multiple clients? And as I have 28 hours of free instance per app per day (http://cloud.google.com/pricing/), does it mean that I would not pay for my server app running all the time?
An instance is an instance of a virtual server, running your code, that is able to serve requests to clients. This is usually done in parallel (Goroutines, Java threads, Python threads with 2.7) for most efficient usage of available resources.
Response times depends on what you're doing in your code, and it's usually IO dependent. If you have a waterfall of serial database lookups, it takes longer than if you only have a single multiget and perhaps an async write.
Part of the deal with GAE is that Google handles the elasticity for you. If there are a lot of connections waiting, new instances will start as needed (until your quota is exhausted). That means it can be difficult to estimate cost upfront, because you don't know exactly how efficient your code is and how much resources you'll need. I recommend a scheme where more usage means more income, and income per request is higher than cost per request. :)
You can tweak settings, saying you want requests to wait in queue, or always have a couple of spare instances ready to serve new requests, which will affect cost for you and response times for users.
In an IaaS scenario you could say that you will use five instances and that's the cost, but in reality you might need only 1 at night local time, and 25 the rest of the day, which means your users would most likely see dropped connections or otherwise have a negative user experience.
A free instance is normally able to handle test traffic during development without exhausting the quota.
Well AppEngine may decide you need to have more than one instance running to handle the requests and so will start another one. You won't be able to limit it to one running instance. In fact, it's sometimes unclear why AE starts another instance when it seems like the requests are low, but it will if it decides it needs another warm instance to be ready to handle requests if the serving instance(s) are too near their limit.

Addressing Backends

Google says at Addressing Backends chapter that without targeting an instance by number, App Engine selects the first available instance of the backend. That makes me wondering – what is that “first available instance”? Is it the instance #1, or is it picked by some other methods?
The exact behavior of this depends on if your instances are dynamic or resident.
For dynamic instances, the request goes to the first instance that can handle the request immediately. If there are no instances that can handle the request immediately, the request is queued or a new instance is started, depending on queueing settings.
For resident instances, the request is sent to the least-loaded backend instance.
The reason for the different behaviors is to make the best use of your instances: resident instances are there anyway, so they're utilized equally, while dynamic instances are spawned only as needed, so the scheduler tries to avoid spinning up new ones if it can.

Resources