I have noticed a recent surge in instance spawning on GAE.
In my app.yaml I have clearly defined that max two instances should be created at a time.
application: xxx
version: 1-6-0
runtime: python27
api_version: 1
instance_class: F2
automatic_scaling:
max_idle_instances: 2
threadsafe: true
However the dashboard is showing 4 instances and the bills are going up. How can I stop this madness? :)
I did a lot of research into this.
F instances are automatically scaled. There is no way to limit that. Hence it makes sense to move the actual work away from frontend instances and put that into a backend instance (B1 or B2). The latter provides another 8 hours free quota.
The real challenge is to re-architect the app to use a default app.yaml for web statics, a mobile.yaml for mobile requests with shorter min_pending_latency and a backend.yaml (B2) instance for handling the tasks and calculations.
All this needs to be routed properly via a dispatch.yaml. In this file you can specify which url endpoint will be handled effectively by which module.
Best way to understand it, is to look at this excellent example from GAE:
It makes sense trying to make it work first on local environment, before trying anything on remote server.
dev_appserver.py dispatch.yaml app.yaml mobile.yaml backend.yaml
Also this official documentation explains some of the above in more detail.
Its pretty impressive, what can be achieved with GAE.
max_idle_instances
The maximum number of IDLE instances that App Engine should maintain
for this version.
It seems that it is not currently possible to set the max number of instances for automatic scaling module. As #DoIT suggests, you can set spending limit, however, keep in mind the below.
When an application exceeds its daily spending limit, any operation
whose free quota has been exhausted fails.
So if you need to control somehow the total number of instances and keep your service running, I see the following possibilities.
Change your scaling type to basic and set max_instances parameter as you like
Keep automatic scaling type and increase min_pending_latency and max_concurrent_requests parameters (multi-threading has to be enabled)
You can find more details here.
The '''max_idle_instances''' set the max number of idle instances, i.e. Instances that are waiting for traffic spike. From your screenshot, it looks like all instances are getting traffic, so it looks ok to me. You can set max daily budget, if you want to control your spend on GAE.
its possible that some of your requests are taking too long to complete, and thus cause new instances to be spawned. You could work around this (I'm told, but have yet to try it myself) by setting a high value for your min_pending_latency property. this could hurt your latency a little, but would also limit the rate of instance-spawning.
Related
Is there a parameter that can be used in .yaml file, which can turn off the google app engine running instance when idle for a specified time? The intention is to reduce the instance hours hence billing.
There is no option in app.yaml flex environment to stop instance if it is idle.
Flex should have atlease 1 instance running.
If you want to be billed for an instance, stop the instance manually or if you know the certain time when your app is not being used (e.g 6pm to 6am next day), you can schedule to stop / start instance version.
gcloud app versions stop v1
There is no app.yaml element that can stop an App Engine instance based on a condition for a specific amount of time.
The closest thing you can do to reduce costs using the app.yaml file, is to specify a cheaper, albeit less potent Instance Class and / or reducing the resources you assign to the instance, (depending on whether you’re using the standard or flexible environment respectively), as these are part of what you’re billed for.
Reducing the amount of instances you need is another approach; this can be done by lowering the value of max_instances and / or max_idle_instances in standard, and max_num_instances in flexible.
If you don’t want to be billed for an instance at all, you can stop the version associated to it with the gcloud command gcloud app versions stop. In standard you won’t be charged when it’s stopped as it’s not running, but in flexible you will still pay for the disk size despite it.
A tool that can help you anticipate and estimate costs is the Pricing Calculator, where you can enter your desired configuration and see what would the costs approximately be. Setting up budget alerts for when you reach a certain spending limit can be useful too. Similarly, in standard, you can set a spending limit, and when an application exceeds it, operations will consequently fail but you won't be billed for it.
I want to understand the difference between min-instances & min-idle-instances?
I saw documentation on https://cloud.google.com/appengine/docs/standard/java/config/appref#scaling_elements but I am not able to differentiate between the two.
My use case:
I want at least 1 instance always up, as otherwise in most of the cases GAE would take time in creating instance causing my requests to time out (in case of basic scaling).
It should stay up, no matter if there is traffic or not, and if a request comes it should immediately serve it. If request volume grows then it should scale.
Which one I should use?
The min-idle-instances make reference to the instances that are ready to support your application in case you receive high traffic or CPU intensive tasks, unlike the min_instances which are the instances used to process the incoming request immediately. I suggest you to take a look on this link to have a deeper explanation of idle instances.
Based on this, since your use-case is focused on serve the incoming requests immediately, I think you should rather go with the min_instances functionality and use the min-idle-instances only in case you want to be ready for sudden load spikes.
The min-instances configuration applies to dynamic instances while min-idle-instances applies to idle/resident instances.
See also:
Introduction to instances for a description of the 2 instance types
Why do more requests go to new (dynamic) instances than to resident instance? for a bit more details
min_instances: the minimum number of instances running at any time, traffic or no traffic, rain or shine.
min_idle_instances: the minimum of idle (or "unused") instances running over the currently used instances. Example: you automatically scaled to 5 app engine instances that are receiving requests, by setting min_idle_instances to 2, you will be running 7 instances in total, the 2 "extra" instances are idle and waiting in case you receive more load. The goal is that when load raises, your users don't have to wait the load time it takes to start up an instance.
IMPORTANT: you need to configure warmup requests for that to work
IMPORTANT2: you'll be billed for any instance running, idle or not. App engine is not cheap so be careful.
min_instances applies to the number of instances that you want to have running, from 0 (useful if you want to scale down when you don't receive traffic) to 1000. You are charged for the number of instances you have running, so, this is important to save costs.
For your case set this value to 1, as it's the most straightforward option.
I'm doing a prototype backend and in the near future I expect little traffic but while testing I consumed all my 300$ free trail.
How can I configure my app to consume the least possible resources? I need things like limiting the number of instances to 1, using a cheap machine, sleep whenever possible, I've read something about Client vs Backend intances.
With time I'll learn the config that best suits me, but now I need the CHEAPEST config to get going.
BTW: I am using managed-vms with Dart.
EDIT
I've been recommended to configure my app.yaml file, what options would you recommend to confront this issue?
There are two train of thought for your issue.
1) Optimization of code: This is very difficult for us as we are not privy to your App's usage and client-base and architecture. In general, it depends on what Google App Engine product you use the most, for example: Datastore API call (fetch, write, delete... etc...), BigQuery and Cloud SQL. Even after optimization, you can still incur a lot of cost depending on traffic.
2) Enforcing cheap operation: This is easier and I think this is what you want. You can manually enforce a daily budget (in your billing setup page) so the App never cost more than a certain amount per day. You can also artificially lower the maximum amount of idling instances to 0 and use the smallest instance possible (F1 for frontend).
For pricing details see this article - https://cloud.google.com/appengine/pricing#Billable_Resource_Unit_Costs
If you use managed VM -- you'll be billed for Compute Engine Instance prices, not for App Engine Instances, and, as I know, the minimum possible instance to use as Managed VM is "g1-small" which costs you $0.023 per hour full sustained usage (if it will be turned on all month), so you minimum bill will be 0.023 * 24 * 30 = $16.56 only for instance hours. Excluding disk and traffic. With minimum amount of datastore operations you may stay on free quota.
Every application consumes resources differently. To minimize your cost, you need to know what resources used the majority of your expenses and go from there.
If it is spent on extra instances that were just sitting there - then trim the number of instances to the minimum required and use a lower class instance. If you are seeing a lot of expense on datastore calls - then look at optimizing your entities and take advantage of memcache.
Lowest Cost for a simple app:
Use App Engine Standard. It scales to zero instances, so will not cost anything if there is no traffic. With App Engine Flex you will pay for the instance hours and the Flex (GCE) instances are bigger.
Use autoscaling with max instances, F1 instance class:
With autoscaling you do not need to guess how many instances you need. F1 are the smallest instances. Set the max instances in case you get DoS'd or more traffic than you can afford.
Stop Instances:
You can stop the App Engine versions when you do not expect the app to be used. The will be no charge for instance hours for either Standard or Flex. For Flex there will be disk charges. The app will be ready to go when you need it again.
App Engine Version Cleanup:
Versions are easy to create and harder to remove. Here is a post on project cleanup. See this post on App Engine cleanup
https://medium.com/google-cloud/app-engine-project-cleanup-9647296e796a
I am terribly worried why my Google App Engine Application consumes super fast to its Front End Instance Hours. It's like 1 hour a day and then my Instance hour is reach its quota. Why I am experiencing this? I already read some articles regarding on this but it seems not solved. What is the right value of Idle Instance and Pending Latency? Thanks for helping guys.
In your Application Dashboard, go to Application Settings
Under performance, check the Frontend Instance Class - An F1 will cost you one instance hour and hour, F2 will be 2, etc. You probably want it set to F1.
Set pending and idle instances to automatic-automatic - this means appengine will scale down your instances to the minimum required.
Assuming you have low volume and no particular memory or CPU requirements, these settings will allow you to run all day for free.
If you are running any backends (check under the Main -> Backends ), these will consume instance hours as well based on the type (B1, B2 etc). You can make these more cost effective by making them dynamic.
My guess is that your instances are staying active for the default 12 hours after the last activity, which, for a Cloud SQL instance in a test environment, causes a lot of extra charges. I haven't yet determined how to programmatically shutdown instances, but you can change the default idle time before shutdown in the appengine-web.xml file (for Java), or the app.yaml file (for Python). I changed my ".xml" file so that my instances shut down after five minutes of inactivity by adding the following lines immediately before the final </appengine-web-app> line:
<basic-scaling>
<idle-timeout>5m</idle-timeout>
</basic-scaling>
I found this information on the following page: https://developers.google.com/appengine/docs/java/modules/
The Python information can be found here:
https://developers.google.com/appengine/docs/python/modules/
I understand that App Engine handles scaling automatically. However, in order to test drive some multi-instance / consolidated state scenarios, I'd like to instruct App Engine to fire up a minimum of 5 instances, even if load does not justify this.
Is there a way of doing this via app.yaml or Dashboard?
Try setting the Min Pending Latency to a very low number (e.g. 100ms), and then send a burst of requests to your app. Then the scheduler will start spinning up multiple instances to handle these requests.
You may need to use a tool for automated load testing - it will be difficult to achieve this manually.
This is what the min_idle_instances value controls.
In app.yaml:
automatic_scaling:
min_idle_instances: 5
It is possible to use either automatic_scaling or manual_scaling. With automatic scaling, you can only influence how many instances are started, with manual scaling, you can decide how many instances you wish to use. Place the following in your app.yaml, or module yaml file:
manual_scaling:
instances: 5