Google says at Addressing Backends chapter that without targeting an instance by number, App Engine selects the first available instance of the backend. That makes me wondering – what is that “first available instance”? Is it the instance #1, or is it picked by some other methods?
The exact behavior of this depends on if your instances are dynamic or resident.
For dynamic instances, the request goes to the first instance that can handle the request immediately. If there are no instances that can handle the request immediately, the request is queued or a new instance is started, depending on queueing settings.
For resident instances, the request is sent to the least-loaded backend instance.
The reason for the different behaviors is to make the best use of your instances: resident instances are there anyway, so they're utilized equally, while dynamic instances are spawned only as needed, so the scheduler tries to avoid spinning up new ones if it can.
Related
In App Engine Dashboard-> In Combobox Summmary -> I choose Instances: there are these values:created, active.
I dont understand what does created Instances mean, active Instances mean.
Is created Instances idle Instances?
Is active Instances dynamic Instances?
Why created Instances is 3 but active Instances is 1, then my system fail.
Warning:
''While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.''
Thanks
Created instances are the ones your application started in a given situation, not necessarily serving, and can also be idle. Instances are created depending on the instance scaling type you specified in your app.yaml.
Active instances are those instances that are serving traffic or have served traffic at a given timeframe.
Here's How Instances are Managed in App Engine for detailed explanation about GAE instances.
The warning you received is usually due to the available instance exceeds the maximum memory for its configured instance_class . You might need to specify higher instance class or use max_concurrent_requests to optimize your instances and properly handle requests.
You could also configure maximum and minimum number of instances in your app.yaml depending on how much traffic you would like your application to handle.
My python API initializes a global variable which takes about 10 seconds to fully initialize before the server starts running. I'm wondering if when GAE initializes a new instance, this same initialization is required? or am I able to access the same variable across multiple instances?
This answer is just complementary to the other mentioned approaches, in most if not all cases they can be combined.
If you're in the standard environment you can take advantage of the warmup requests to well... warm (most of) your instances up before real traffic hits them.
Multithreading complexity doesn't really matter in such cases since you know that no other request can hit the instance until its init isn't complete - i.e. until it successfully responds to the warmup request. So you can optimize for this case while still playing it safe (even if not very efficient) for the rare cases when instances still start up cold and can get multiple requests in parallel.
Warmup requests aren't supported in the flexible environment, but:
To warm up your app, implement a health check handler that only
returns a ready status when the application is warmed up.
For example, you can create a readiness check that returns a ready
status after initializing the cache so your app won't receive traffic
until it is ready.
Each instance in the application is a separate interpreter, so globals need to be initialised per instance.
If initialisation is costly, but the computed value doesn't change frequently it may be worth storing the value in memcache, the datastore, a database or some other globally available store. Retrieval from memcache is fast, but persistence is not guaranteed, so you may need to re-run the initialisation from time to time. Retrieval from the datastore or a database is usually slower, but persistence is guaranteed in normal circumstances.
As dhauptman observes in the comments, this article contains some advice on lazy-loading global variables.
I want to understand the difference between min-instances & min-idle-instances?
I saw documentation on https://cloud.google.com/appengine/docs/standard/java/config/appref#scaling_elements but I am not able to differentiate between the two.
My use case:
I want at least 1 instance always up, as otherwise in most of the cases GAE would take time in creating instance causing my requests to time out (in case of basic scaling).
It should stay up, no matter if there is traffic or not, and if a request comes it should immediately serve it. If request volume grows then it should scale.
Which one I should use?
The min-idle-instances make reference to the instances that are ready to support your application in case you receive high traffic or CPU intensive tasks, unlike the min_instances which are the instances used to process the incoming request immediately. I suggest you to take a look on this link to have a deeper explanation of idle instances.
Based on this, since your use-case is focused on serve the incoming requests immediately, I think you should rather go with the min_instances functionality and use the min-idle-instances only in case you want to be ready for sudden load spikes.
The min-instances configuration applies to dynamic instances while min-idle-instances applies to idle/resident instances.
See also:
Introduction to instances for a description of the 2 instance types
Why do more requests go to new (dynamic) instances than to resident instance? for a bit more details
min_instances: the minimum number of instances running at any time, traffic or no traffic, rain or shine.
min_idle_instances: the minimum of idle (or "unused") instances running over the currently used instances. Example: you automatically scaled to 5 app engine instances that are receiving requests, by setting min_idle_instances to 2, you will be running 7 instances in total, the 2 "extra" instances are idle and waiting in case you receive more load. The goal is that when load raises, your users don't have to wait the load time it takes to start up an instance.
IMPORTANT: you need to configure warmup requests for that to work
IMPORTANT2: you'll be billed for any instance running, idle or not. App engine is not cheap so be careful.
min_instances applies to the number of instances that you want to have running, from 0 (useful if you want to scale down when you don't receive traffic) to 1000. You are charged for the number of instances you have running, so, this is important to save costs.
For your case set this value to 1, as it's the most straightforward option.
App Engine has been great for requests that process quickly with no external API calls to databases or caches or third-party resources, but we've found that introducing any sort of "longer running" component or external latency (for example in a HTTP POST operation that runs asynchronously in the background and might take a second or two to process a few more intense database queries... totally invisible and OK from a UX perspective on the client-side because it's asynchronous but expensive to App Engine billing since it's long running) ... the "instance hours" compound and drive costs up considerably.
These sorts of expense inducing situations where a request is literally just waiting for a response from an external resource and requiring almost zero CPU during their idling seem avoidable, but I'm not sure if it's avoidable with App Engine.
It's almost like a "long poll" where the response might be left open but doing nothing.
Is there a way to do this on App Engine without just paying an insane amount for instance hours, or would we be better off moving to Compute Engine or EC2? Does it scale automatically based on CPU load, or is it based solely on open and perhaps inactive requests in total count? — threadsafe is indeed enabled.
There are really two ways to go about this one (top of mind).
Use Task Queues!
If the work doesn't need to be exactly at the same time of the request, this is exactly what [task queues] in App Engine are for. They allow you to put a job on a queue, and have another module pick up the work. They're kind of great because you can separately scale your front end and back end processes.
If that doesn't work....
Use App Engine Flexible
Under the hood App Engine Flexible is just running GCE instances. The cost structure is entirely different, since you persistently have a VM running in the background serving your requests.
Hope this helps!
What you're really worried about here is how App Engine scales your instances. Because many of your requests require few resources, your app might be able to handle many more concurrent requests on a single instance than normal. You can look into parameters that shape scaling here. Of particular interest:
max_concurrent_requests The number of concurrent requests an automatic scaling instance can accept before the scheduler spawns a new instance (Default: 8, Maximum: 80).
There is a danger here, where an instance may fill up with non-long-polling requests and become overburdened. To prevent that, you could isolate your long-polling requests into their own service and set its scaling parameters separately from the rest of your app.
I have implemented instance mem-caches because we have very static data and the memcache is not very reliable and rather slow compared to an instance cache.
However there is some situations where I would like to invalidate the instance caches. Is there any way to look them up?
Example
Admin A updates a large gamesheet on instance A and that instance looks up all other instances and update the data using a simple REST api.
TL;DR: you can't.
Unlike backends, frontend instances are not individually addressable; that is, there is no way for you to make a RESTy URLFetch call to a specific frontend instance. Even if they were, there is no builtin mechanism for enumerating frontend instances, so you would need to roll your own, e.g. keeping a list of live instances in the datastore and adding to it in a warmup request and removing on repeated connect failure. But at that point you've just implemented a slower, more costly, and less available memcache service.
If you moved all the cache services to backends (using your instance-local static, or, for instance, running a memcached written in Go as a different app version), it's true you would gain a degree of control (or at least transparency) regarding evictions. Availability, speed, and cost would still likely suffer.