Please explain App Engine Instances parameters - google-app-engine

In App Engine Dashboard-> In Combobox Summmary -> I choose Instances: there are these values:created, active.
I dont understand what does created Instances mean, active Instances mean.
Is created Instances idle Instances?
Is active Instances dynamic Instances?
Why created Instances is 3 but active Instances is 1, then my system fail.
Warning:
''While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.''
Thanks

Created instances are the ones your application started in a given situation, not necessarily serving, and can also be idle. Instances are created depending on the instance scaling type you specified in your app.yaml.
Active instances are those instances that are serving traffic or have served traffic at a given timeframe.
Here's How Instances are Managed in App Engine for detailed explanation about GAE instances.
The warning you received is usually due to the available instance exceeds the maximum memory for its configured instance_class . You might need to specify higher instance class or use max_concurrent_requests to optimize your instances and properly handle requests.
You could also configure maximum and minimum number of instances in your app.yaml depending on how much traffic you would like your application to handle.

Related

How to prevent downtime in App Engine Flex when instances are automatically restarted

Situation
custom runtime (Docker/Node) on App Engine Flex
manually scaled to 1 single instance as we manage the resources ourselves (2 cpu / 6 gb ram)
liveness and readiness checks are configured
as expected, vm instances are automatically restarted on a weekly basis to apply OS / system updates
this is visible in the Activity pane of the Google Cloud Console
Stackdriver logs confirm this activity (e.g. shutdown-script: INFO Starting shutdown scripts. and startup-script: INFO Starting startup scripts.)
no instance is available during these restarts, resulting in 503 errors when visiting the application running on the instance
Goal
to have some control on the amount of instances to prevent downtime
e.g. temporarily scale to 2 instances while 1 instance is restarting
keeping control of the available resources (cpu / ram)
Question
We've considered simply having 2 instances available at all times, but are worried both would be restarted at the same time since they are part of the same instance group.
What would allow us to keep everything up and running while still controlling the amount of instances / resources used?
I have a flex app with two instances running for similar reasons. For me, an instance will occasionally exceed memory limits and need to be restarted. Since I have a second instance, there should always be an instance available.
I hadn't considered the Google updates to my instances. I just checked my recent history, and Google restarted my two instances yesterday. The restarts were 7 minutes apart so, at least in this example, my users always had an instance available to them.
I suspect that Google does not simultaneously restart all of your instances. This would create a brief period of downtime for all flex customers, and nobody wants downtime for a cloud service.
UPDATE:
This is a guess, but I expect that when Google updates a flex instance, it will create a new instance and only shutdown the old instance after the new instance is available. At least, if I were running Google, that is how I would do it. That way you have 100% uptime and you will very briefly have an extra instance running. This would even work with a single flex instance.
Maybe you should try Automatic scaling showed here: Scaling instances.
This allows your application to automatically create instances based on request rate, response latencies, and other application metrics. When one of your instances are gets shut down, another instance could be created in order to "cover" the missing instance. Thus, your service won't get interrupted.

What is the difference between min-instances and min-idle-instances in google app engine?

I want to understand the difference between min-instances & min-idle-instances?
I saw documentation on https://cloud.google.com/appengine/docs/standard/java/config/appref#scaling_elements but I am not able to differentiate between the two.
My use case:
I want at least 1 instance always up, as otherwise in most of the cases GAE would take time in creating instance causing my requests to time out (in case of basic scaling).
It should stay up, no matter if there is traffic or not, and if a request comes it should immediately serve it. If request volume grows then it should scale.
Which one I should use?
The min-idle-instances make reference to the instances that are ready to support your application in case you receive high traffic or CPU intensive tasks, unlike the min_instances which are the instances used to process the incoming request immediately. I suggest you to take a look on this link to have a deeper explanation of idle instances.
Based on this, since your use-case is focused on serve the incoming requests immediately, I think you should rather go with the min_instances functionality and use the min-idle-instances only in case you want to be ready for sudden load spikes.
The min-instances configuration applies to dynamic instances while min-idle-instances applies to idle/resident instances.
See also:
Introduction to instances for a description of the 2 instance types
Why do more requests go to new (dynamic) instances than to resident instance? for a bit more details
min_instances: the minimum number of instances running at any time, traffic or no traffic, rain or shine.
min_idle_instances: the minimum of idle (or "unused") instances running over the currently used instances. Example: you automatically scaled to 5 app engine instances that are receiving requests, by setting min_idle_instances to 2, you will be running 7 instances in total, the 2 "extra" instances are idle and waiting in case you receive more load. The goal is that when load raises, your users don't have to wait the load time it takes to start up an instance.
IMPORTANT: you need to configure warmup requests for that to work
IMPORTANT2: you'll be billed for any instance running, idle or not. App engine is not cheap so be careful.
min_instances applies to the number of instances that you want to have running, from 0 (useful if you want to scale down when you don't receive traffic) to 1000. You are charged for the number of instances you have running, so, this is important to save costs.
For your case set this value to 1, as it's the most straightforward option.

When does Google App Engine start or stop an instance?

We have an App Engine app that handles an average .5 requests per second, and seemingly all those requests can be handled by the same instance running a Go app as the main version.
However, sometimes App Engine kicks off a second instance (and sometimes even a third one), that doesn't seem to do anything past handling one or two requests. Here's an example.
Shutting down that instance manually doesn't seem to cause any harm, so my question is, why does App Engine not kill the instance after it did not get any requests for a while? (The above example had four requests in the past hour, often the requests/age ratio gets even lower).
Update:
A similar situation is when an instance is started on a different version. App Engine only seems to kill the instance after hours of not getting any requests.
Under Application Settings → Performance,
Idle Instances is set to Automatic – 20
Pending Latency is set to 150ms – 250ms
I wish I knew what controls if/when it kills idle instances, but I can't see any documentation of it.
To avoid excess instances starting, I think the main thing you can do here is increase the pending latency:
The Pending Latency slider controls how long requests spend in the pending queue before being served by an Instance of the default version of your application. If the minimum pending latency is high App Engine will allow requests to wait rather than start new Instances to process them. This can reduce the number of instance hours your application uses, but can result in more user-visible latency.
Even if you only average 4 requests/hour, if you happen to get two closely spaced I suppose it's possible it would start a new instance.
You can also see some small amount of information in the logs about why it started a new instance.
The "How Applications Scale" section of the Google App Engine documentation states:
Scaling in Instances
Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load.
App Engine also scales instances in reverse when request volumes decrease. This scaling helps ensure that all of your application's current instances are being used to optimal efficiency and cost effectiveness.
It also states you can "specify a minimum number of idle instances", and to "optimize for high performance or low cost" in the administration console.
Try setting the "Idle instances" field to something like 3 - 5, and "optimize for low cost" and see if that affects the instance kill time.

Addressing Backends

Google says at Addressing Backends chapter that without targeting an instance by number, App Engine selects the first available instance of the backend. That makes me wondering – what is that “first available instance”? Is it the instance #1, or is it picked by some other methods?
The exact behavior of this depends on if your instances are dynamic or resident.
For dynamic instances, the request goes to the first instance that can handle the request immediately. If there are no instances that can handle the request immediately, the request is queued or a new instance is started, depending on queueing settings.
For resident instances, the request is sent to the least-loaded backend instance.
The reason for the different behaviors is to make the best use of your instances: resident instances are there anyway, so they're utilized equally, while dynamic instances are spawned only as needed, so the scheduler tries to avoid spinning up new ones if it can.

GAE Go - "This request caused a new process to be started for your application..."

I've encountered this problem for a second time now, and I'm wondering if there is any solution to this. I'm running an application on Google App Engine that relies on frequent communication with a website through HTTP JSON RPC. It appears that GAE has a tendency to randomly display a message like this in the logs:
"This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application."
And reset all variables stored in RAM without warning. The same process happens over and over no matter how many times I set the variables again or upload newer code to GAE, although incrementing the app version number seems to solve the problem.
How can I get more information on this behaviour, how to avoid it and prevent data loss of my Golang applications on Google App Engine?
EDIT:
The variables stored in RAM are small classes of strings, bytes, bools and pointers. Nothing too complicated or big.
Google App Engine seems to "start a new process" in matter of seconds of heavier use, which shouldn't be long enough time for the application to be shut down for not being used. The timespan between application being uploaded to GAE, having its variable set and a new process being created is less than a minute.
Do you realize that GAE is a cloud hosting solution that automatically manages instances based on the load? This is it's main feature and reason people are using it.
When load increases, GAE creates a new instance, which , of course, has all RAM variables empty.
The solution is not to expect variables to be available or store them to permanent storage at the end of request (session, memcache, datastore) and load them if not present at the beginnig of request.
You can read about GAE instances in their documentation here, check out the performance section:
http://code.google.com/appengine/kb/java.html
In your case of having small data available, if its static then you can load it into memory on startup of a new instance. If it's dynamic data, you should be saving it to the database using their api.
My recommendation for keeping a GAE instance alive, either pay for the Always-On service or follow my recommendations for using a cron here:
http://rwyland.blogspot.com/2012/02/keeping-google-app-engine-gae-instances.html
I use what I call a "prime schedule" of a 3, 7, 11 minute cron job.
You should consider using Backends if you want long running instances with resident memory.

Resources