How to estimate hosting services cost on GAE? - google-app-engine

I'm building a system which I plan to deploy on Google App Engine. Current pricing is described here:
Google App Engine - Pricing and Features
I need an estimate of cost per client managed by the webapp. The cost won't be very accurate until I have completed the development. GAE uses such fine grained price calculation such as READs and WRITEs that it becomes a very daunting task to estimate operation cost per user.
I have an agile dev. process which leaves me even more clueless in determining my cost. I've been exploiting my users stories to create a cost baseline per user story. Then I roughly estimate how will the user execute each story workflow to finally compute a simplistic estimation.
As I see it, computing estimates for Datastore API is overly complex for a startup project. The other costs are a bit easier to grasp. Unfortunately, I need to give an approximate cost to my manager!
Has anyone undergone such a task? Any pointers would be great, regarding tools, examples, or any other related information.
Thank you.

Yes, it is possible to do cost estimate analysis for app engine applications. Based on my experience, the three major areas of cost that I encountered while doing my analysis are the instance hour cost, the datastore read/write cost, and the datastore stored data cost.
YMMV based on the type of app that you are developing, of course. If it is an intense OLTP application that handle simple-but-frequent CRUD to your data records, most of the cost would be on the datastore read/write operations, so I would suggest to start your estimate on this resource.
For datastore read/write, the cost for writing is generally much more expensive than the cost for reading the data. This is because write cost take into account not only the cost to write the entity, but also to write all the indexes associated with the entity. I would suggest you to read an article by Google about the life of a datastore write, especially the part about Apply Phase, to understand how to calculate the number of write per entity based on your data model.
To do an estimate of instance hours that you would need, the simplest approach (but not always feasible) would be to deploy a simple app to test how long would a particular request took. If this approach is undesirable, you might also base your estimate on the Google App Engine System Status page (e.g. what would be the latency for a datastore write for a particularly sized entity) to get a (very) rough picture on how long would it take to process your request.
The third major area of cost, in my opinion, is the datastore stored data cost. This would vary based on your data model, of course, but any estimate you made need to also take into account the storage that would be taken by the entity indexes. Taking a quick glance on the datastore statistic page, I think the indexes could increase the storage size between 40% to 400%, depending on how many index you have for the particular entity.

Remember that most costs are an estimation of real costs. The definite source of truth is here: https://cloud.google.com/pricing/.
A good tool to estimate your cost for Appengine is this awesome Chrome Extension: "App Engine Offline Statistics Estimator".
You can also check out the AppStats package (to infer costs from within the app via API).
Recap:
Official Appengine Pricing
AppStats for Python
AppStats for Java
Online Estimator (OSE) Chrome Extension

You can use the pricing calculator
https://cloud.google.com/products/calculator/

Related

google appstore, how to split fees per datastore namespace

I'd like to make a GAE app multi-tenant to cater to different clients (companies), database namespaces seems like a GAE endorsed solution. Is there a meaningful way to split GAE fees among client/namespaces? GAE costs for app are mainly depends on user activities - backend instances up time, because new instances are created or (after 15 min delay) terminated proportionally to the server load, not total volume of data user has or created. Ideally the way the fees are split should be meaningful and could be explained to the clients.
I guess the most fair fee splitting solution is just create a new app for a new client, so all costs reported separately, yet total cost will grow up, I expect few apps running on same instances will use server resources more economically.
Every app engine request is logged with a rough estimated cost measurement. It is possible to log the namespace/client associated with every request and query the logs to add up the estimated instance costs for that namespace. Note that the estimated cost field is deprecated and may be inaccurate. It is mostly useful as a rough guide to the proportion of instance cost associated with each client.
As far as datastore pricing goes, the cloud console will tell you how much data has been stored in each namespace, and you can calculate costs from that. For reads/writes, we have set up a logging system to help us track reads and writes per namespace (i.e. every request tracks the number of datastore reads and writes it does in each namespace and logs these numbers at the end of the request).
The bottom line is that with some investments into infrastructure and logging, it is possible to roughly track costs per namespace. But no, App Engine does not make this easy, and it may be impossible to calculate very accurate cost estimates.

How to configure App Engine for minimal cost?

I'm doing a prototype backend and in the near future I expect little traffic but while testing I consumed all my 300$ free trail.
How can I configure my app to consume the least possible resources? I need things like limiting the number of instances to 1, using a cheap machine, sleep whenever possible, I've read something about Client vs Backend intances.
With time I'll learn the config that best suits me, but now I need the CHEAPEST config to get going.
BTW: I am using managed-vms with Dart.
EDIT
I've been recommended to configure my app.yaml file, what options would you recommend to confront this issue?
There are two train of thought for your issue.
1) Optimization of code: This is very difficult for us as we are not privy to your App's usage and client-base and architecture. In general, it depends on what Google App Engine product you use the most, for example: Datastore API call (fetch, write, delete... etc...), BigQuery and Cloud SQL. Even after optimization, you can still incur a lot of cost depending on traffic.
2) Enforcing cheap operation: This is easier and I think this is what you want. You can manually enforce a daily budget (in your billing setup page) so the App never cost more than a certain amount per day. You can also artificially lower the maximum amount of idling instances to 0 and use the smallest instance possible (F1 for frontend).
For pricing details see this article - https://cloud.google.com/appengine/pricing#Billable_Resource_Unit_Costs
If you use managed VM -- you'll be billed for Compute Engine Instance prices, not for App Engine Instances, and, as I know, the minimum possible instance to use as Managed VM is "g1-small" which costs you $0.023 per hour full sustained usage (if it will be turned on all month), so you minimum bill will be 0.023 * 24 * 30 = $16.56 only for instance hours. Excluding disk and traffic. With minimum amount of datastore operations you may stay on free quota.
Every application consumes resources differently. To minimize your cost, you need to know what resources used the majority of your expenses and go from there.
If it is spent on extra instances that were just sitting there - then trim the number of instances to the minimum required and use a lower class instance. If you are seeing a lot of expense on datastore calls - then look at optimizing your entities and take advantage of memcache.
Lowest Cost for a simple app:
Use App Engine Standard. It scales to zero instances, so will not cost anything if there is no traffic. With App Engine Flex you will pay for the instance hours and the Flex (GCE) instances are bigger.
Use autoscaling with max instances, F1 instance class:
With autoscaling you do not need to guess how many instances you need. F1 are the smallest instances. Set the max instances in case you get DoS'd or more traffic than you can afford.
Stop Instances:
You can stop the App Engine versions when you do not expect the app to be used. The will be no charge for instance hours for either Standard or Flex. For Flex there will be disk charges. The app will be ready to go when you need it again.
App Engine Version Cleanup:
Versions are easy to create and harder to remove. Here is a post on project cleanup. See this post on App Engine cleanup
https://medium.com/google-cloud/app-engine-project-cleanup-9647296e796a

Performance expectations for an Amazon RDS instance

I am using Google App Engine for an app, and the app is currently hitting the datastore at a rate of around 2.5 million row writes, and 4.5 million row reads per day.
I am currently porting the app to Amazon Elastic Beanstalk and Amazon RDS due to the very high costs of running an application on GAE.
Based on the values above, how can I find out / estimate what type of RDS instance I will need for my requirements? Is the above a considerable amount of processing for, lets say a Small or Micro MySQL RDS instance to process in a day?
Totally depends on a number of factors:
Row size.
Field types and sizes.
Complexity of your queries (joins, etc).
Proper use of indexes.
Row contention and other possible bottlenecks.
Really hard to tell. But from experience, if you don't need fancy replication or sharding, the costs of the GAE datastore are usually higher as it offers total redundancy, distribution, scalability, etc.
My suggestion would be to write a quick program to benchmark a load on RDS that replicates what you are expecting. Should be easy to write if you forgo all the business rules and such and just do fake but randomized reads and writes.

Is GAE optimized for database-heavy applications?

I'm writing a very limited-purpose web application that stores about 10-20k user-submitted articles (typically 500-700 words). At any time, any user should be able to perform searches on tags and keywords, edit any part of any article (metadata, text, or tags), or download a copy of the entire database that is recent up-to-the-hour. (It can be from a cache as long as it is updated hourly.) Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity. This usage pattern is set in stone.
Is GAE a wise choice for this application? It appeals to me for its low cost (hopefully free), elasticity of scale, and professional management of most of the stack. I like the idea of an app engine as an alternative to a host. However, the excessive limitations and quotas on all manner of datastore usage concern me, as does the trade-off between strong and eventual consistency imposed by the datastore's distributed architecture.
Is there a way to fit this application into GAE? Should I use the ndb API instead of the plain datastore API? Or are the requirements so data-intensive that GAE is more expensive than hosts like Webfaction?
As long as you don't require full text search on the articles (which is currently still marked as experimental and limited to ~1000 queries per day), your usage scenario sounds like it would fit just fine in App Engine.
stores about 10-20k user-submitted articles (typically 500-700 words)
Maximum entity size in App Engine is 1 MB, so as long as the total size of the article is lower than that, it should not be a problem. Also, the cost for reading data in is not tied to the size of the entity but to the number of entities being read.
At any time, any user should be able to perform searches on tags and keywords.
Again, as long as the search on the tags and keywords are not full text searches, App Engine's datastore queries could handle these kind of searches efficiently. If you want to search on both tags and keywords at the same time, you would need to build a composite index for both fields. This could increase your write cost.
download a copy of the entire database that is recent up-to-the-hour.
You could use cron/scheduled task to schedule a hourly dump to the blobstore. The cron could be targeted to a backend instance if your dump takes more than 60 seconds to be finished. Do remember that with each dump, you would need to read all entities in the database, and this means 10-20k read ops per hour. You could add a timestamp field to your entity, and have your dump servlet query for anything newer than the last dump instead to save up read ops.
Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity.
This is where GAE shines, you could have very efficient instance usages with GAE in this case.
I don't think your application is particularly "database-heavy".
500-700 words is only a few KB of data.
I think GAE is a good fit.
You could store each article as a textproperty on an entity, with tags in a listproperty. For searching text you could use the search service https://developers.google.com/appengine/docs/python/search/ (which currently has quota limits).
Not 100% sure about downloading all the data, but I think you could store all the data in the blobstore (possibly as pdf?) and then allow users to download that blob.
I would choose NDB over regular datastore, mostly for the built-in async functionality and caching.
Regarding staying below quota, it depends on how many people are accessing the site and how much data they download/upload.

What are the rules of thumb when trying to decide if developing on Google App Engine platform is worthwhile

I have an idea for a web application and I am currently researching different platforms. I am really interested in Google App Engine, but it looks like it works pretty good for certain application types while it is less suitable for others (there are horror as well as success stories e.g. Goodbye Google App Engine vs. Why we are really happy with Google App Engine
There is also a similar negative story in this thread from 1 year ago, concluding GAE was not ready for commercial production platform: GAE as Production Platform. There are also other threads from 2009 talking about data select limits (1000 rows) that has since been lifted.
My app will essentially perform some mathematical analysis based on data pulled from external data feeds (could be some substantial amount of data), it would be real time only the first time data is downloaded for a specific item at hand and then stored and retrieved locally from the database at that point. There will be some additional external data pulls as scheduled intervals.
Based on this brief description, should I even bother starting on GAE? In general, what are the rules of thumb to try and decide if developing on GAE is suitable for a problem at hand? Also, what are the good examples of Apps in Production that use GAE. It looks like GAE App Gallery is not around anymore, but I would definitely appreciate any Web 2.0 App examples running on the app engine.
In your specific case I would double check these factors:
a. Is the mathematical analysis a long running CPU intensive job?
GAE is not designed for long running CPU intensive computational Jobs; this would lead to have an high billing cost and would force you to design your application to avoid some GAE limitations (10 minutes max per job, limited soft memory, CPU quota, etc. etc.).
b. Are you planning to retrieve external data using a mainstream API (twitter, yahoo, facebook)?
Your application shares the same pool of IPs with other applications; if the API you want to adopt does not allow authenticated request, your application will suffer hiccups caused by throttling/quota limits errors. I faced this problem here.
App Engine should work fine for your application. It's generally designed to serve, and to scale, sites that serve mostly user-facing traffic. Applications that it's not suitable for are things such as video transcoding, which rely heavily on backend processing, or things that have to shell out to native code, such as 3D graphics, etcetera.
Depends on what type of mathematical analysis are you doing. If your application is heavy in I/O, I would give it some pause. On GAE, you're kind of limited in your I/O options. You basically have the following:
RAM: I can't recall exactly, but GAE imposes a hard limit of around 200MB of RAM.
Datastore: You get plenty of space here, but it's slow compared to a cached local file system.
Memcache: Faster than datastore, but not nearly as fast as a cached disk. And worse, it's a cache, so there's no guarantee that it won't get wiped out.
External sources: These include calling out to external web-pages. Lots of flexibility, but very slow.
In sum, I would perhaps look at other options if you're doing heavy I/O on a medium-size dataset (>20MB and ~<2GB). These are probably non-issues for 90% of web-apps, although you should be aware of them.
All the negatives aside, working on GAE is a joyous experience. You spend more time programming and less time configuring. And it's really cheap.

Resources