GAE instance hours a lot more than real hours

GAE instance hours a lot more than real hours - google-app-engine

My Java app runs on the Standard Google App Engine (GAE) and is configured to have 1 minimum instance and 1 maximum instance. It is also configured to have 1 minimum idle instance which allows the single instance to run non-stop. I ran a timer for 1 hour and then checked how many instance hours have elapsed. It indicates slightly over 2 hours. How is this possible when only a single instance is running?

From your configuration you should actually be having 2 instances running:
one resident instance, due to the minimum idle instance configuration. This serves only sudden transient traffic peaks while GAE spins up the necessary dynamic instances, see min-idle-instance on GAE/J and Why do more requests go to new (dynamic) instances than to resident instance?
one dynamic instance, due to the min/max 1 instance configs, handling the regular traffic
Note: the instance class also matters (but probably it's not your case here). From Standard environment instances:
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will
see the appropriate multiple of instance hours. For example, if you
use an F4 instance for one hour, you do not see "F4" listed, but you
see billing for four instance hours at the F1 rate.

Related

How to prevent downtime in App Engine Flex when instances are automatically restarted

Situation
custom runtime (Docker/Node) on App Engine Flex
manually scaled to 1 single instance as we manage the resources ourselves (2 cpu / 6 gb ram)
liveness and readiness checks are configured
as expected, vm instances are automatically restarted on a weekly basis to apply OS / system updates
this is visible in the Activity pane of the Google Cloud Console
Stackdriver logs confirm this activity (e.g. shutdown-script: INFO Starting shutdown scripts. and startup-script: INFO Starting startup scripts.)
no instance is available during these restarts, resulting in 503 errors when visiting the application running on the instance
Goal
to have some control on the amount of instances to prevent downtime
e.g. temporarily scale to 2 instances while 1 instance is restarting
keeping control of the available resources (cpu / ram)
Question
We've considered simply having 2 instances available at all times, but are worried both would be restarted at the same time since they are part of the same instance group.
What would allow us to keep everything up and running while still controlling the amount of instances / resources used?

I have a flex app with two instances running for similar reasons. For me, an instance will occasionally exceed memory limits and need to be restarted. Since I have a second instance, there should always be an instance available.
I hadn't considered the Google updates to my instances. I just checked my recent history, and Google restarted my two instances yesterday. The restarts were 7 minutes apart so, at least in this example, my users always had an instance available to them.
I suspect that Google does not simultaneously restart all of your instances. This would create a brief period of downtime for all flex customers, and nobody wants downtime for a cloud service.
UPDATE:
This is a guess, but I expect that when Google updates a flex instance, it will create a new instance and only shutdown the old instance after the new instance is available. At least, if I were running Google, that is how I would do it. That way you have 100% uptime and you will very briefly have an extra instance running. This would even work with a single flex instance.

Maybe you should try Automatic scaling showed here: Scaling instances.
This allows your application to automatically create instances based on request rate, response latencies, and other application metrics. When one of your instances are gets shut down, another instance could be created in order to "cover" the missing instance. Thus, your service won't get interrupted.

Exceeded soft memory limit of 243 MB with 307 MB after servicing 4330 requests total. Consider setting a larger instance class in app.yaml

Situation:
My project are mostly automated tasks.
My GAE (standard environment) app has 40 crons job like this, all run on default module (frontend):
- description: My cron job Nth
url: /mycronjob_n/ ###### Please note n is the nth cron job.
schedule: every 1 minutes
Each of cron jobs
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
deferred.defer(mytask,pram)
Where mytask is
def mytask(pram):
#Do some loops, read and write datastore, call api, which I guesss taking less than 30 seconds.
return 'Task finish'
Problem:
As title of the question, i am running out of RAM. Frontend instance hours are increasing to 100 hours.
My wrong thought?
defer task runs on background because it is not something that user sends request when visit the website. Therefore, they will not be considered as a request.
I break my cronjobs_n into small different tasks because i think it can help to reduce the running time each cronjobs_n so that REDUCE instance's ram consumption.
My question: (purpose: keep the frontend/backend instance hours as low as possible, and I accept latency)
Is defer task counted as request?
How many request do I have in 1 mintues?
40 request of mycronjob_n
or
40 requests of mycronjob_n x 100 mytask = 4000
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10 to 20 F1 instances more and then shut down if idle? I set autoscale in app.yaml. I dont see the meaning of autoscale of GAE here as advertised.
What is the best way to optimize my app?
If defer task is counted as request, it is meaningless to slit mycronjob_n into different small tasks, right? I mean, my current method is as same as:
#app.route('/mycronjob_n/')
def mycronjob_n():
for i in (0,100):
pram = prams[i]
options = TaskRetryOptions(task_retry_limit=0,task_age_limit=0)
mytask(pram) #Call function mytask
Here, will my app has 40 requests per minute, each request runs for 100 x 30s = 3000s? So will this approach also return out of memory?
Should I create a backend service running on F1 instance and put all cron jobs on that backend service? I heard that a request can run for 24 hours.
If I change default service instance from F1 to F2,F3, will I still get 28 hours free? I heard free tier apply to F1 only. And will my backend service get 9 hours free if it runs on B2 instead of B1?
My regret:
- I am quite regret that I choose GAE for this project. I choosed it because it has free tier. But I realized that free tier is just for hobby/testing purpose. If I run a real app, the cost will increase very fast that it make me think GAE is expensive. The datastore reading/writing are so expensive even though I tried my best to optimize them. The frontend hours are also always high. I am paying 40 usd per month for GAE. With 40 usd per month, maybe I can get better server if I choose Heroku, Digital Ocean? Do you think so?

Yes, task queue requests (deferred included) are also requests, they just can run longer than user requests. And they need instances to serve them, which count as instance hours. Since you have at least one cron job running every minute - you won't have any 15 minute idle interval allowing your instances to shut down - so you'll need at least one instance running at all times. If you use any instance class other than F1/B1 - you'll exceed the free instance hours quota. See Standard environment instances billing.
You seem to be under the impression that the number of requests is what's driving your costs up. It's not, at least not directly. The culprit is most likely the number of instances running.
If 3-4 instances can not handle 4000 requests, why doesnt GAE add 10
to 20 F1 instances more and then shut down if idle?
Most likely GAE does exactly that - spawns several instances. But you keep pumping requests every minute, they don't reach an idle state long enough, so they don't shut down. Which drives your instance hours up.
There are 2 things you can do about it:
stagger your deferred tasks so they don't hit need to be handled at the same time. Fewer instance (maybe even a single one?) may be necessary to handle them in such case. See Combine cron jobs to reduce number of instances and Preventing Google App Engine Cron jobs from creating multiple instances (and thus burning through all my instance hours)
tune your app's scaling configuration (the range is limited though). See Scaling elements.
You should also carefully read How Instances are Managed.
Yes, you only pay for exceeds the free quota, regardless of the instance class. Billing is in F1/B1 units anyways - from the above billing link:
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will
see the appropriate multiple of instance hours. For example, if you
use an F4 instance for one hour, you do not see "F4" listed, but you
see billing for four instance hours at the F1 rate.
About the RAM usage, splitting the cron job in multiple tasks isn't necessarily helping, see App Engine Deferred: Tracking Down Memory Leaks
Finally, cost comparing GAE with Heroku, Digital Ocean isn't an apples-to-apples comparison: GAE is PaaS, not IaaS, it's IMHO expected to be more expensive. Choosing one or the other is really up to you.

Front End Instance Hours reach limit super fast

I am terribly worried why my Google App Engine Application consumes super fast to its Front End Instance Hours. It's like 1 hour a day and then my Instance hour is reach its quota. Why I am experiencing this? I already read some articles regarding on this but it seems not solved. What is the right value of Idle Instance and Pending Latency? Thanks for helping guys.

In your Application Dashboard, go to Application Settings
Under performance, check the Frontend Instance Class - An F1 will cost you one instance hour and hour, F2 will be 2, etc. You probably want it set to F1.
Set pending and idle instances to automatic-automatic - this means appengine will scale down your instances to the minimum required.
Assuming you have low volume and no particular memory or CPU requirements, these settings will allow you to run all day for free.
If you are running any backends (check under the Main -> Backends ), these will consume instance hours as well based on the type (B1, B2 etc). You can make these more cost effective by making them dynamic.

My guess is that your instances are staying active for the default 12 hours after the last activity, which, for a Cloud SQL instance in a test environment, causes a lot of extra charges. I haven't yet determined how to programmatically shutdown instances, but you can change the default idle time before shutdown in the appengine-web.xml file (for Java), or the app.yaml file (for Python). I changed my ".xml" file so that my instances shut down after five minutes of inactivity by adding the following lines immediately before the final </appengine-web-app> line:
<basic-scaling>
<idle-timeout>5m</idle-timeout>
</basic-scaling>
I found this information on the following page: https://developers.google.com/appengine/docs/java/modules/
The Python information can be found here:
https://developers.google.com/appengine/docs/python/modules/

Why google shuts down my residents instance even the minimum idling instance set to 1

I've just started playing with GAE. Today I just noticed that GAE shuts down my residents instance even the minimum idling instance set to 1, which causes a cold-start for the next request.
So here is the settings:
1. one simple frontend app, no other stuff
2. minimum idling instance set to 1
3. billing enabled and no charges, which means the shutdown is not because of budget
issue
4. an outside java process making a simple request every hour
From the instance chart in the admin console, it's obvious that at -1.5hour and -0.5hour time point, GAE spawned another dynamic instance to serve the outside request or something, and shutdown both the residents and dynamic instances after 15 minutes. The zero-instance situation remained for another 15 minutes until a residents instance was created again.
Who has similar issues or any ideas? Thanks.

Yes, that has happened to us all. Resident instances do not shutdown, unless manually forced to. From GAE Console:
Idle Instances (another way to referring to resident instance) are pre-loaded with your application code, so when a new Instance is needed, it can serve traffic immediately, thus, avoiding high latency during load spikes.
Resident instances start serving pages, if they become too busy they become a dynamic instance and another resident instance is started in its place. For that reason you will sometimes see the age of a resident instance younger than their dynamic counterparts.

Why is the number of "billed" instances so much greater than the number of "active" instances?

Performing load tests on my app, I noticed that the Instances dashboard graph shows a pretty big difference between the number of active and billed instances:
What do active and total mean?
Also, after spending the day running load tests, here's what I see:
In the first peak, the number of billed instances pretty much matches the number of total instances. Then, on subsequent loads, the bumber of billed instances sits in between total and active.
Update 2013-02-21: I did another batch of load tests today, and I'm still seeing variance in where the billed instances stand relative to total and _active:
How are these numbers calculated? How should interpret them, considering that I'm trying to forecast our operational costs based on these numbers?

It seems (I believe) that if you have F2 instance in application settings each F2 active instance is counted as 2 billing instances. If you set F4 instances it counted as 4 billing instances. And so forth.
Total instances is number of instantiated but not billed instances - kind of "gift" from Google. If there would be more requests that need more instances GAE would not need to start a new instance but would use 1 from those "non-active". When the load is raising GAE start new instances but when the load is going down GAE would keep instances for a while but would not charge you for them. But they would be shut down eventually if load did not raise back.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight