GAE: About the Usage of High Replication Data - google-app-engine

I'm using Google App Engine and High Replication Datastore.
I checked the Dashboard of one of my GAE app today, I found that High Replication Data became 52%, 0.26 of 0.50 GBytes in the Billing Status.
I don't use so much data for the app, so I also checked Datastore Statistics and Total number of entities is about 60,000 and Size of all entities is only 42 MBytes, which is far from 0.26 GBytes.
What is the difference between the Usage in the Dashboard and in the Datastore Statistics? And how can I reduce the former Usage?
Thank you.

Because the datastore creates automatic indexes for your entities. In addition if you have custom indexes, they will also need storage.
You can reduce this by removing unused indexes and by not indexing properties, which are not needed for queries (setting indexed=false).
In general however, you need to get used to the idea that the storage for your entities is not the same as total storage needed for the datastore ;)

Related

Is google Datastore recommended for storing logs?

I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
5 inserts per entity per seconds - Is a transaction considered a single insert?
Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!
Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance.
Those numbers do not apply but datastore is not at all designed for what you want.
bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.
I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.

Inexpensive/free way to delete a kind from datastore?

I have about 8.8 million entities for a particular kind. They take up 5GB of space.
The built-in indexes for this kind take up 50GB of space.
I did some tests, and deleting 100k entries produces over a million data store write operations.
Since datastore writes cost ~$1 for a million ops, it looks like it will cost me at least $100 to delete this kind.
Is there any shortcut to doing this? I did try using the built-in mapreduce 'delete' in the appengine interface, but it started burning through my daily quota quite fast so I stopped it.
So the question is: is there any inexpensive/free way to delete a kind that I am missing?
-s
Enable the Datastore Admin feature in your GAE app. Once it's enabled open Datastore Admin in the Admin Console. Among other things it allows you to bulk delete all entities of a kind. While Google says:
Caution: This feature is currently experimental. We believe it is the fastest way to bulk-delete data, but it is not yet stable and you may encounter occasional bugs.
.. they don't say what the pricing on bulk delete is. It might be the same as for Datastore Writes. If it is then 100k ops will cost $0.09 resulting in a total cost of $0.09 / 100,000 * 8,800,000 = $7.92.

Memcache System - Keys getting dropped frequently

We currently have our application hosted in Google App Engine. Billing is enabled to that application. This application is still in beta that we are using for testing purpose. We have a logic of serving data from the Memcache if present, if not then we get the data from the datastore and update the memcache and serve the data. We are encountering strange behaviour related to Memcache. The data related to some keys in Memcache is getting dropped after few minutes after being set. We tried setting expiration time for the keys in the memcache, even that does not seem to work. Since the data is getting dropped from the memcache the data is again from the datastore which is increasing the billing for our application.
Currently nearly 80% of the billing is related to datastore read. The datastore read is high as the memcache is not working efficiently as it should be. Any insight why we are facing this issue would be really helpful.
Just an FYI, we are having around 75000 keys in the memcache with total size of 100 MB data. Our structure demands keeping such large number of keys in memcache, which I think should not be an issue.
Our application is being by 10 users and the billing amount per day is coming to around $40.
Thanks,
Krish
Unfortunately memcache will evict keys as and when it requires. Setting the time to expire only means the item will be in memcache for up to the expiry time.
Take a look at the docs regarding eviction.
Also, take a look at this for some more insight into ways around memcache issues.
Regarding your data structure, perhaps you could post a new question and we can see if others have advice for you.

How to estimate hosting services cost on GAE?

I'm building a system which I plan to deploy on Google App Engine. Current pricing is described here:
Google App Engine - Pricing and Features
I need an estimate of cost per client managed by the webapp. The cost won't be very accurate until I have completed the development. GAE uses such fine grained price calculation such as READs and WRITEs that it becomes a very daunting task to estimate operation cost per user.
I have an agile dev. process which leaves me even more clueless in determining my cost. I've been exploiting my users stories to create a cost baseline per user story. Then I roughly estimate how will the user execute each story workflow to finally compute a simplistic estimation.
As I see it, computing estimates for Datastore API is overly complex for a startup project. The other costs are a bit easier to grasp. Unfortunately, I need to give an approximate cost to my manager!
Has anyone undergone such a task? Any pointers would be great, regarding tools, examples, or any other related information.
Thank you.
Yes, it is possible to do cost estimate analysis for app engine applications. Based on my experience, the three major areas of cost that I encountered while doing my analysis are the instance hour cost, the datastore read/write cost, and the datastore stored data cost.
YMMV based on the type of app that you are developing, of course. If it is an intense OLTP application that handle simple-but-frequent CRUD to your data records, most of the cost would be on the datastore read/write operations, so I would suggest to start your estimate on this resource.
For datastore read/write, the cost for writing is generally much more expensive than the cost for reading the data. This is because write cost take into account not only the cost to write the entity, but also to write all the indexes associated with the entity. I would suggest you to read an article by Google about the life of a datastore write, especially the part about Apply Phase, to understand how to calculate the number of write per entity based on your data model.
To do an estimate of instance hours that you would need, the simplest approach (but not always feasible) would be to deploy a simple app to test how long would a particular request took. If this approach is undesirable, you might also base your estimate on the Google App Engine System Status page (e.g. what would be the latency for a datastore write for a particularly sized entity) to get a (very) rough picture on how long would it take to process your request.
The third major area of cost, in my opinion, is the datastore stored data cost. This would vary based on your data model, of course, but any estimate you made need to also take into account the storage that would be taken by the entity indexes. Taking a quick glance on the datastore statistic page, I think the indexes could increase the storage size between 40% to 400%, depending on how many index you have for the particular entity.
Remember that most costs are an estimation of real costs. The definite source of truth is here: https://cloud.google.com/pricing/.
A good tool to estimate your cost for Appengine is this awesome Chrome Extension: "App Engine Offline Statistics Estimator".
You can also check out the AppStats package (to infer costs from within the app via API).
Recap:
Official Appengine Pricing
AppStats for Python
AppStats for Java
Online Estimator (OSE) Chrome Extension
You can use the pricing calculator
https://cloud.google.com/products/calculator/

Google App Engine Database : disk usage

I have a Google App engine Application storing 135 MBytes into its datastore, however when I check my quotas It tells me that I'm using 76% of my Free 1gb of stored data.
Is it because of the index ? How can it use so much diskspace?
Thanks
It could be due to indexes. Every property (with exception of some types) has "single property" indexes unless you explicitly disable indexing of that property. Since the indexes store the property name and the value, the impact on storage space can be quite significant. If you would like statistics on your index usage, star issue 2740.
If you are using a lot of tasks, your stored task bytes also counts against your storage usage. Also note that blobstore usage counts against your storage quota.

Resources