I have about 20GB of data in the datastore.
The builtin indexes (which I have no control over) have increased to ~200GB.
I backed up all the data to BigQuery and don't need it anymore.
I am trying to get out of the datastore, but I can't - datastore Admin console can't delete that many entities (bulk delete option uses map reduce, which fails on quota within an hour or so), and the cost of deleting each entity programatically is too expensive (> $1000 - many write operations because of the many indexes).
Meanwhile google charges me $50/month for storing the data I don't need :(
how do I: Close my datastore project (not the app engine project, just the datastore part of it), or just wipe out all the data?
Please help me get out of this!
Wait until July 1, 2016. Then delete all entities.
Starting from July 1, 2016 the pricing is $0.02 per 100,000 deletes regardless of indexed properties.
Related
After reading the pricing of the new google relational database Spanner, it states that the cost is based on storage and use. They charge $0.9 by hour per node.
The question is: if I create the database for development, and only use it 6 hours a day, 100 hours a Month as maximum... Do I have to pay only for the hours with active use (receiving queries) or for the whole month? The charge is similar to App Engine instances?
In the first case, there is no problem spending US$90 for testing this new database, but if they charge for the whole month (using it or not)... the cost rise to US$670/month...
Anyone has been using this database and can share the final cost invoiced?
In the tutorial they recommend to delete de database after testing, but for development deleting the database and recreating database and data every day is not suitable.
Correct, you need to maintain at least 1 node to keep the data, and you need at least 1 node for every 2 TiB of data.
So, if you upload 50 TiB of data, you need to keep 25 nodes at a minimum to maintain the data.
More info - https://cloud.google.com/spanner/docs/limits
You are charged for any resources in your instances (while the nodes are running and storage is being used), even if you aren't actively issueing queries. It's like Compute Engine or Cloud SQL.
I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
5 inserts per entity per seconds - Is a transaction considered a single insert?
Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!
Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance.
Those numbers do not apply but datastore is not at all designed for what you want.
bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.
I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.
I needed to rebuild a database and noticed that my write operations were extremely high after I deleted a database kind. Do those count as write operations?
Yes. There is a separate line for deleting entities in App Engine's Datastore pricing:
https://developers.google.com/appengine/pricing
This changed as of July 1st, 2016.
Deletes now count against "Entity Deletes" rather than writes. They are also much cheaper at $0.02 per 100,000 entities.
We have a pricing page that explains it.
I have the Datastore Admin functionality installed in my AppEngine Console. I have several Entity Kinds that have over 100,000 entities each. I need to clear out the existing entities and reload the data. I have found using the Datastore Admin that:
Small numbers of entity deletes are fine. Small being less that 20 entities. I can run the Datastore Admin delete, the records are removed immediately and the Entities are removed from the Datastore Admin screen and the entity kinds can no longer be found in the datastore viewer.
For large Entity Kinds, greater than 100,000, the Datastore Admin delete jobs run fine and report no errors, but the Datastore Admin shows the exact same number of entities. Thinking this may just be an issue with statistics not getting updated I used the Datastore Viewer and there is still data in each of the Entity Kinds that I tried to delete.
Because the Datastore View does not show a total number of records, I have no idea if some of the entities were actually deleted (at least not without manually paginating through thousands of pages of data.).
Does anyone have any ideas of what may be happening here? I have tried writing my own delete programs in Java and run them as Backends looping through each Entity Kind and deleting records in batches of 50, but the system still shows statistics of entities still existing and the Datastore Viewer continues to show records.
I have about 8.8 million entities for a particular kind. They take up 5GB of space.
The built-in indexes for this kind take up 50GB of space.
I did some tests, and deleting 100k entries produces over a million data store write operations.
Since datastore writes cost ~$1 for a million ops, it looks like it will cost me at least $100 to delete this kind.
Is there any shortcut to doing this? I did try using the built-in mapreduce 'delete' in the appengine interface, but it started burning through my daily quota quite fast so I stopped it.
So the question is: is there any inexpensive/free way to delete a kind that I am missing?
-s
Enable the Datastore Admin feature in your GAE app. Once it's enabled open Datastore Admin in the Admin Console. Among other things it allows you to bulk delete all entities of a kind. While Google says:
Caution: This feature is currently experimental. We believe it is the fastest way to bulk-delete data, but it is not yet stable and you may encounter occasional bugs.
.. they don't say what the pricing on bulk delete is. It might be the same as for Datastore Writes. If it is then 100k ops will cost $0.09 resulting in a total cost of $0.09 / 100,000 * 8,800,000 = $7.92.