GAE Pricing: Always On feature and instances charging

GAE Pricing: Always On feature and instances charging - google-app-engine

There's something I really don't get about the new pricing. As far as I can see, I am now billed (amongst others) for the number of "instance/hours". On the other hand, a while back I've opted for the "Always on" feature, which since then effectively has 3 "Resident" instances of my application always running.
Now, A.F.A.I.C.S. , on the old pricing model, the one where I was charged by CPU Time used, the "always on" feature was great, not only did it made the app more responsive, but since the instances were no longer started-up/torn-down when traffic was scarce, the CPU Time was lowered (and indeed this is visible on the dashboard).
But now, since I'm billed by Instance/hours, the fact that I have this "always on" option active doesn't in fact add a lot of money to my bill, even when those instances are not actually doing anything (simply because they're just there, always on)?
I'm asking this because since the new pricing model was activated, I have whopping increase in Frontend Instance Hours (right now it's 29.21 for the last 9 hours), where before the CPU Time never really came close to depassing the free quota.

The "Always On" feature does not exist as of 1.6.0. The equivalent replacement is setting the Min Idle Instances slider to 3 (and leaving the Max Idle Instances at "Automatic") in your Application Settings in the Admin Console.

add a lot of money to my bill, even when those instances are not actually doing anything (simply because they're just there, always on)?add a lot of money to my bill, even when those instances are not actually doing anything ...
The problem is that they are doing something. They are occupying RAM. The new pricing model attempts to more accurately model the underlying costs to Google, or at least that's what they tell us. You can change how many instances are always on by going to the admin interface. If you aren't really using all 3, try going down to 2 or 1. If your traffic spikes, more instances will be started up. You can also set a value for how much latency you want users to endure before new instances are spun up.

The scheduler might be spinning up more than one instance to respond to threads.
Is this in Java? You could try to make it threaded, to make it more responsive to lower latency.
You could also tweak the scheduler parameters to discourage it from spinning up more instances.

Related

Keeping GAE at a single instance with flexibility to scale

I have a relatively low traffic app that can easily be handled by a single instance more than 95% of the time. Occasionally, having more than one instance running would be helpful to provide a better user experience.
It seems that GAE should be able to automatically scale in this way, but I can't get GAE to keep only a single instance when traffic is low. This is what I have tried:
Set min instances to 1
Set max instances to 3
Set min pending latency to 1 second
Set max pending latency to automatic (and also 1 second)
With this configuration, GAE will just about always run two instances even though one is sufficient.
I know I can set max instances to one, but I want to be able to automatically scale when I need it.
Is it possible to do what I want?

Note that the min/max property that you are setting are for IDLE instances.
Set min instances to 1 means that you will ALWAYS have at least one instance running, even when there are no requests for over 15 minutes. This could be set to 0 if you have low traffic AND your app launches quickly, i.e. under 1-2 seconds, otherwise the users will have bad experience with very slow response on their first request.
Set max instances to 3 means that it's OK for GAE to keep up to three instances running at any time, even when there are only few requests. This could be set to 1 to save some costs but would make some requests slow (time it takes to start new instance + time to launch your app) when traffic increases.
The max-idle-instances does not limit the number of instances in the event of a traffic spike, your app will always scale and new instances will keep launching if needed. The min/max settings are only there to help handle a sudden increase in traffic and there is no way to limit the number of instances that can be launched.
Take a look at this article for some more details: Setting the Number of Idle Instances
Regarding your question, you could try decreasing the max-idle-instances to 1 and see if that helps. You don't have to worry about scaling, new instances will still launch if needed, just keep in mind that the experience might not be as smooth for your users. If you decreased the number of max-idle-isntances and you still see more than 1 instance running on very low traffic, then your app might need to be optimized and multi-threading might need to be enabled if it wasn't.

How to Gain Visibility and Optimize Quota Usage in Google App Engine?

How do I go about optimizing my Google App Engine app to reduce instance hours I am currently using/paying for?
I have been using app engine for a while and the cost has been creeping upwards. I now spend enough on GAE to invest time into reducing the expense. More than half of my GAE bill is due to frontend instance hours, so it's the obvious place to start. But before I can start optimizing, I have to figure out what's using the instance hours.
However, I am having difficulty trying to determine what is currently using so many of my frontend instance hours. My app serves many ajax requests, dynamic HTML pages, cron jobs, and deferred tasks. For all I know there could be some runaway process that is causing my instance usage to be so high.
What methods or techniques are available to allow me to gain visibility into my app to see where I am using instance hours?

Besides code changes (all suggestions in the other answer are good) you need to look into the instances over time graph.
If you have spikes and constant use, the instances created during the spikes wont go to sleep because appengine will keep using them. In appspot application settings, change the "idle instances" max to a low number like 1 (or your actual daily average).
Also, change min latency to a higher number so less instances will be created on spikes.
All these suggestions can make an immediate effect on lowering your bill, but its just a complement to the code optimizations suggested in the other answer.

This is a very broad question, but I will offer a few pointers.
First, examine App Engine's console Dashboard and logs. See if there are any errors. Errors are expensive both in terms of lost business and in extra instance hours. For example, tasks are retried several times, and these reties may easily prolong the life of an instance beyond what is necessary.
Second, the Dashboard shows you the summary of your requests over 24 hours period. Look for requests with high latency. See if you can improve them. This will both improve the user experience and may reduce the number of instance hours as more requests can be handled by each instance.
Also look for data points that surprise you as a developer of your app. If you see a request that is called many more times that you think is normal, zero in on it and see what it is happening.
Third, look at queues execution rates. When you add multiple tasks to a queue, do you really need all of them to be executed within seconds? If not, reduce the execution rate so that the queue never needs more than one instance.
Fourth, examine your cron jobs. If you can reduce their frequency, you can save a bunch of instance hours. If your cron jobs must run frequently and do a lot of computing, consider moving them to a Compute Engine instance. Compute Engine instances are many times cheaper, so having such an instance run for 24 hours may be a better option than hitting an App Engine instance every 15 minutes (or even every hour).
Fifth, make sure your app is thread-safe, and your App Engine configuration states so.
Finally, do the things that all web developers do (or should do) to improve their apps/websites. Cache what can be cached. Minify what needs to be minified. Put images in sprites. Split you code if it can be split. Use Memcache. Etc. All of these steps reduce latency and/or client-server roundtrips, which helps to reduce the number of instances for the same number of users.

Ok, my other answer was about optimizing at the settings level.
To trace the performance at a granular level use the new cloud trace relased today at google i/o 2014.
http://googledevelopers.blogspot.com/2014/06/cloud-platform-at-google-io-enabling.html

What is a Google App Engine instance?

I am trying to estimate the monthly costs for having GAE for in-app store and I do not really understand what is an instance and what can I do within one instance.
Can I just have one instance with multiple threads to deal with multiple clients? And as I have 28 hours of free instance per app per day (http://cloud.google.com/pricing/), does it mean that I would not pay for my server app running all the time?

An instance is an instance of a virtual server, running your code, that is able to serve requests to clients. This is usually done in parallel (Goroutines, Java threads, Python threads with 2.7) for most efficient usage of available resources.
Response times depends on what you're doing in your code, and it's usually IO dependent. If you have a waterfall of serial database lookups, it takes longer than if you only have a single multiget and perhaps an async write.
Part of the deal with GAE is that Google handles the elasticity for you. If there are a lot of connections waiting, new instances will start as needed (until your quota is exhausted). That means it can be difficult to estimate cost upfront, because you don't know exactly how efficient your code is and how much resources you'll need. I recommend a scheme where more usage means more income, and income per request is higher than cost per request. :)
You can tweak settings, saying you want requests to wait in queue, or always have a couple of spare instances ready to serve new requests, which will affect cost for you and response times for users.
In an IaaS scenario you could say that you will use five instances and that's the cost, but in reality you might need only 1 at night local time, and 25 the rest of the day, which means your users would most likely see dropped connections or otherwise have a negative user experience.
A free instance is normally able to handle test traffic during development without exhausting the quota.

Well AppEngine may decide you need to have more than one instance running to handle the requests and so will start another one. You won't be able to limit it to one running instance. In fact, it's sometimes unclear why AE starts another instance when it seems like the requests are low, but it will if it decides it needs another warm instance to be ready to handle requests if the serving instance(s) are too near their limit.

App Engine loading request even when idle instance available

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.

Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.

Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Is app engine more expensive when it's slower?

There have been quite a few occasions recently when app engine appears to run slower. To some degree that's understandable with the architecture of their cloud platform. I'm not talking about new server instances - just requests to warm servers. I'm also just referring to CPU, not datastore API, but I do wonder about that as well.
It seems that during these slow periods I get a lot more yellow warnings on my requests - saying I am using a lot of CPU. Certainly they take longer to complete during this period. What concerns me is that during these slow periods, my billable CPU seems to go up.
So to be clear - when app engine is fast, a request might complete in 100ms. In a slow period, it might take more than 1s for the same request. Same URI, same caching, same processing path, same datastore, same indexes - much more CPU. The yellow warnings, as I understand it, are referring to billable CPU usage, and there's many more of them when app engine is slower.
This seems to set up a bizarre situation where my app costs more to run when app engine performance is worse. This means google makes more money the more poorly the platform performs (up to the point where it fails or customers leave). Maybe I've got the situation all wrong, and it doesn't work like that - but if it does work like that, then as a customer the pressures and balances there are all wrong. That's not intimating any wrong-doing on google's part - just that the relationships between those two things don't seem right.
It almost seems like google's algorithm goes something like - 'If I give a processing job to a CPU and start my watch, then stop it when the job returns I get the billable CPU figure.' i.e. it doesn't measure CPU work at all. Surely that time should be divided by the number of processing jobs being concurrently executed plus some extra to cover the additional context switching. I'm sure that stuff is hard to measure - perhaps that's the reason.
I guess you could argue it is fair that you pay more when app engine is in high demand, but that makes budgeting close to impossible - you can't generate stats like '100 users costs me $1 a day', because that could change for a whole host of reasons - including app engine onboarding more customers than the infrastructure can realistically handle. If google over-subscribes app engine then all customers pay more - it's another relationship that doesn't sound right. Surely google's costs should go down as they onboard more customers, and those customers use more resources - based on economies of scale.
Should I expect two identical requests in my app to cost me roughly the same amount each time they run - regardless of how much wall-time app engine takes to actually complete them? Have I misunderstood how this works? If I haven't, is there a reason why I shouldn't be worried about it in the long term? Is there some documentation which makes this situation clearer? Cheers,
Colin

It would be more complicated, but they could change the billing algorithm to be a function of load. Or perhaps they could normalize the CPU measurements based on the performance of similar calls in the past.
I agree that this presents problems for the developers.

Yes this is true. It is a bummer. It also takes them over a second to start up my Java application (which I was billed for) every time they decided my site was in low demand, and didn't need the resources.
I ended up using a cron to auto ping my site every minute to keep it warm.. doing all the wasted work made my bill cheaper, as it didn't have the startup time, instead it just had lots of 2ms pings...

This question appears old and I think the pricing scheme must have changed...
The Google App Engine charges for "instance hours" and the instances currently spawned are viewable in the GAE console. And Google provides adjustments so you can decide cost vs latency for your app.
https://developers.google.com/appengine/docs/adminconsole/performancesettings
I did noticed that if the front-end is bogged down hitting a common backend resource that GAE will spawn a bunch of instances to get latency down. And you will pay for those instance hours even though latency/throughput doesn't improve. The adjustments I mentioned seem to help with that.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight