Why Active:0? I am running a lot of cron jobs and tasks queue so i think Active:1 or Active:2 .
Billed Instances Estimate: 1.00 Does it mean I only have to pay for 1 instance even though I see 2 instances running, which both have requests: 700, 62.
I see 0, 0.5, 1.0, 1.5, 2.0, 2.5 value on vertical axis. Why and how can the number of instances 0.5, 1.5, 2.5 ?
My app.yaml is
automatic_scaling:
max_idle_instances: 1
min_idle_instances: 0
max_concurrent_requests: 80
target_cpu_utilization: 0.9
min_pending_latency: 500ms
How can I set the max number of instances = 1?.
I do not want to have 2 instances because 1 instances running 24 hours will equal 24 hours (within free tier).
The graph can be confusing - similar questions popped in my head as well at the beginning. So I watched closely the graphs and the numbers from the summary page which doing tests for more than 1 month. And compared the projections I had from those observations with the actual bill I got. I concluded that the graphs aren't very precise, I trust the numbers more. I only check the graphs to get a feeling on traffic patterns, I mostly disregard their estimates for billing purposes.
Another thing I noticed is that GAE isn't actually/aggressively killing the idle instances right away, it just stops taking them into account for billing.
As for setting the max number of instances - the capability has been recently added. From Scaling elements:
max_instances
Optional. Specify a value between 0 and 2147483647, where zero
disables the setting. This parameter specifies the maximum number of
instances for App Engine to create for this module version. This is
useful to limit the costs of a module.
Important: If you use appcfg from the App Engine SDK for Python to deploy, you cannot use this parameter in your app.yaml. Instead,
set the parameter as described in Setting Autoscaling Parameters in
the API Explorer, or by using the App Engine Admin API.
If you are only using 1 instance no matter what, you might as well use manual scaling.
Sometimes App Engine would keep an instance alive (for a variety of reasons such as traffic prediction). You would not billed for idling instances that you did not provision (beyond the 15 minute shutdown time). Billable instances is often not the same as created or active instances. The graph is mostly used to monitor traffic not really designed for cost calculation (it's a line graph, it's hard to calculate cost from it without doing calculus). It's best to simply use your bill every cycle to track your actual usage.
Related
Is there a parameter that can be used in .yaml file, which can turn off the google app engine running instance when idle for a specified time? The intention is to reduce the instance hours hence billing.
There is no option in app.yaml flex environment to stop instance if it is idle.
Flex should have atlease 1 instance running.
If you want to be billed for an instance, stop the instance manually or if you know the certain time when your app is not being used (e.g 6pm to 6am next day), you can schedule to stop / start instance version.
gcloud app versions stop v1
There is no app.yaml element that can stop an App Engine instance based on a condition for a specific amount of time.
The closest thing you can do to reduce costs using the app.yaml file, is to specify a cheaper, albeit less potent Instance Class and / or reducing the resources you assign to the instance, (depending on whether you’re using the standard or flexible environment respectively), as these are part of what you’re billed for.
Reducing the amount of instances you need is another approach; this can be done by lowering the value of max_instances and / or max_idle_instances in standard, and max_num_instances in flexible.
If you don’t want to be billed for an instance at all, you can stop the version associated to it with the gcloud command gcloud app versions stop. In standard you won’t be charged when it’s stopped as it’s not running, but in flexible you will still pay for the disk size despite it.
A tool that can help you anticipate and estimate costs is the Pricing Calculator, where you can enter your desired configuration and see what would the costs approximately be. Setting up budget alerts for when you reach a certain spending limit can be useful too. Similarly, in standard, you can set a spending limit, and when an application exceeds it, operations will consequently fail but you won't be billed for it.
Since moving to GAE go runtime 1.11, we notice the number of instances is much higher. When I dig into the problem, it seems that GAE is not running in concurrency.
Here is a very light module of the frontend settings:
automatic_scaling:
min_idle_instances: 1
max_idle_instances: 2
min_pending_latency: 0.030s
max_pending_latency: automatic
max_concurrent_requests: 80
target_throughput_utilization: 0.95
And with about 50 requests per second, GAE spun up 2 active instances. Each has about 25 QPS and the average latency is under 20ms. Even the chart shows the instances aren't really busy.
What is in the settings that would cause this issue?
I don't think Go runtime 1.9 has this issue. And the document said it ignores the max concurrent requests setting which should make Go runtime 1.11 perform much better.
The max_pending_latency: automatic setting will make your app to scale up if your latency goes above 30ms . If in the current situation, with 2 instances, your average latency is somewhat under 20, it would have been possible that in the initial situation it went over 30 for a short period of time, which triggered the scaling. In case you do not want this to happen, you can always set the max_pending_latency manually, with a value above 30.
Regarding the comparison with Go 1.9, it is known that Go 1.11 consumes slightly more RAM and CPU power than his ancestor, so this would be something normal.
In conclusion I do not think that what is happening in your situation is an issue, but something normal. In case you do not agree you can provide me your whole, sanitized app.yaml file and I will look deeper to see if anything is wrong and edit my answer.
Situation:
My project are mostly automated tasks.
My GAE (standard environment) app has 40 crons job like this, all run on default module (frontend):
- description: My cron job Nth
url: /mycronjob_n/ ###### Please note n is the nth cron job.
schedule: every 1 minutes
Each of cron jobs
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
deferred.defer(mytask,pram)
Where mytask is
def mytask(pram):
#Do some loops, read and write datastore, call api, which I guesss taking less than 30 seconds.
return 'Task finish'
Problem:
As title of the question, i am running out of RAM. Frontend instance hours are increasing to 100 hours.
My wrong thought?
defer task runs on background because it is not something that user sends request when visit the website. Therefore, they will not be considered as a request.
I break my cronjobs_n into small different tasks because i think it can help to reduce the running time each cronjobs_n so that REDUCE instance's ram consumption.
My question: (purpose: keep the frontend/backend instance hours as low as possible, and I accept latency)
Is defer task counted as request?
How many request do I have in 1 mintues?
40 request of mycronjob_n
or
40 requests of mycronjob_n x 100 mytask = 4000
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10 to 20 F1 instances more and then shut down if idle? I set autoscale in app.yaml. I dont see the meaning of autoscale of GAE here as advertised.
What is the best way to optimize my app?
If defer task is counted as request, it is meaningless to slit mycronjob_n into different small tasks, right? I mean, my current method is as same as:
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
mytask(pram) #Call function mytask
Here, will my app has 40 requests per minute, each request runs for 100 x 30s = 3000s? So will this approach also return out of memory?
Should I create a backend service running on F1 instance and put all cron jobs on that backend service? I heard that a request can run for 24 hours.
If I change default service instance from F1 to F2,F3, will I still get 28 hours free? I heard free tier apply to F1 only. And will my backend service get 9 hours free if it runs on B2 instead of B1?
My regret:
- I am quite regret that I choose GAE for this project. I choosed it because it has free tier. But I realized that free tier is just for hobby/testing purpose. If I run a real app, the cost will increase very fast that it make me think GAE is expensive. The datastore reading/writing are so expensive even though I tried my best to optimize them. The frontend hours are also always high. I am paying 40 usd per month for GAE. With 40 usd per month, maybe I can get better server if I choose Heroku, Digital Ocean? Do you think so?
Yes, task queue requests (deferred included) are also requests, they just can run longer than user requests. And they need instances to serve them, which count as instance hours. Since you have at least one cron job running every minute - you won't have any 15 minute idle interval allowing your instances to shut down - so you'll need at least one instance running at all times. If you use any instance class other than F1/B1 - you'll exceed the free instance hours quota. See Standard environment instances billing.
You seem to be under the impression that the number of requests is what's driving your costs up. It's not, at least not directly. The culprit is most likely the number of instances running.
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10
to 20 F1 instances more and then shut down if idle?
Most likely GAE does exactly that - spawns several instances. But you keep pumping requests every minute, they don't reach an idle state long enough, so they don't shut down. Which drives your instance hours up.
There are 2 things you can do about it:
stagger your deferred tasks so they don't hit need to be handled at the same time. Fewer instance (maybe even a single one?) may be necessary to handle them in such case. See Combine cron jobs to reduce number of instances and Preventing Google App Engine Cron jobs from creating multiple instances (and thus burning through all my instance hours)
tune your app's scaling configuration (the range is limited though). See Scaling elements.
You should also carefully read How Instances are Managed.
Yes, you only pay for exceeds the free quota, regardless of the instance class. Billing is in F1/B1 units anyways - from the above billing link:
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will
see the appropriate multiple of instance hours. For example, if you
use an F4 instance for one hour, you do not see "F4" listed, but you
see billing for four instance hours at the F1 rate.
About the RAM usage, splitting the cron job in multiple tasks isn't necessarily helping, see App Engine Deferred: Tracking Down Memory Leaks
Finally, cost comparing GAE with Heroku, Digital Ocean isn't an apples-to-apples comparison: GAE is PaaS, not IaaS, it's IMHO expected to be more expensive. Choosing one or the other is really up to you.
My Java app runs on the Standard Google App Engine (GAE) and is configured to have 1 minimum instance and 1 maximum instance. It is also configured to have 1 minimum idle instance which allows the single instance to run non-stop. I ran a timer for 1 hour and then checked how many instance hours have elapsed. It indicates slightly over 2 hours. How is this possible when only a single instance is running?
From your configuration you should actually be having 2 instances running:
one resident instance, due to the minimum idle instance configuration. This serves only sudden transient traffic peaks while GAE spins up the necessary dynamic instances, see min-idle-instance on GAE/J and Why do more requests go to new (dynamic) instances than to resident instance?
one dynamic instance, due to the min/max 1 instance configs, handling the regular traffic
Note: the instance class also matters (but probably it's not your case here). From Standard environment instances:
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will
see the appropriate multiple of instance hours. For example, if you
use an F4 instance for one hour, you do not see "F4" listed, but you
see billing for four instance hours at the F1 rate.
I have a php AppEngine app. No scaling config, I'm just using the defaults.
Yesterday it's scaled up to two instances when I was doing some heavy testing. It's only handled one or two requests since then, but is still running 2 instances. How long does it take to scale back down? That was more than 24 hours ago. Thanks!
They usually spin down after 15 minutes. However it will depend heavily on your application profile, and is not guaranteed.
You can configure the maximum idle instances and then they guarantee not to bill you above that threshold.