I needed to rebuild a database and noticed that my write operations were extremely high after I deleted a database kind. Do those count as write operations?
Yes. There is a separate line for deleting entities in App Engine's Datastore pricing:
https://developers.google.com/appengine/pricing
This changed as of July 1st, 2016.
Deletes now count against "Entity Deletes" rather than writes. They are also much cheaper at $0.02 per 100,000 entities.
We have a pricing page that explains it.
Related
Here Java Understanding write costs I was reading about optimizing my entities.
What I don't understand in the first line is
When your application executes a Cloud Datastore put operation
I'm using NodeJs and the NodeJs documentation mentions no put command, hence I'm confused if the extra index write costs only applies to the Insert command or also to other commands like Update.
Update
I found this answer Google Datastore new pricing effect operations
from what I understand it doesn't matter if I let datastore automatically index my properties since I'm only charged once for each time an entity is inserted, updated and read.
I guess the only improvement I get by excluding indexes on some properties is decreased storage requirements?
Yes the amount of indexes won't increase write costs. Although they will be making use of storage. You have the official Datastore Pricing model here
I have about 20GB of data in the datastore.
The builtin indexes (which I have no control over) have increased to ~200GB.
I backed up all the data to BigQuery and don't need it anymore.
I am trying to get out of the datastore, but I can't - datastore Admin console can't delete that many entities (bulk delete option uses map reduce, which fails on quota within an hour or so), and the cost of deleting each entity programatically is too expensive (> $1000 - many write operations because of the many indexes).
Meanwhile google charges me $50/month for storing the data I don't need :(
how do I: Close my datastore project (not the app engine project, just the datastore part of it), or just wipe out all the data?
Please help me get out of this!
Wait until July 1, 2016. Then delete all entities.
Starting from July 1, 2016 the pricing is $0.02 per 100,000 deletes regardless of indexed properties.
I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
5 inserts per entity per seconds - Is a transaction considered a single insert?
Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!
Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance.
Those numbers do not apply but datastore is not at all designed for what you want.
bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.
I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.
I have about 8.8 million entities for a particular kind. They take up 5GB of space.
The built-in indexes for this kind take up 50GB of space.
I did some tests, and deleting 100k entries produces over a million data store write operations.
Since datastore writes cost ~$1 for a million ops, it looks like it will cost me at least $100 to delete this kind.
Is there any shortcut to doing this? I did try using the built-in mapreduce 'delete' in the appengine interface, but it started burning through my daily quota quite fast so I stopped it.
So the question is: is there any inexpensive/free way to delete a kind that I am missing?
-s
Enable the Datastore Admin feature in your GAE app. Once it's enabled open Datastore Admin in the Admin Console. Among other things it allows you to bulk delete all entities of a kind. While Google says:
Caution: This feature is currently experimental. We believe it is the fastest way to bulk-delete data, but it is not yet stable and you may encounter occasional bugs.
.. they don't say what the pricing on bulk delete is. It might be the same as for Datastore Writes. If it is then 100k ops will cost $0.09 resulting in a total cost of $0.09 / 100,000 * 8,800,000 = $7.92.
Using AppEngine with Python and the HRD retrieving records sequentially (via an indexed field which is an incrementing integer timestamp) we get 15,000 records returned in 30-45 seconds. (Batching and limiting is used.) I did experiment with doing queries on two instances in parallel but still achieved the same overall throughput.
Is there a way to improve this overall number without changing any code? I'm hoping we can just pay some more and get better database throughput. (You can pay more for bigger frontends but that didn't affect database throughput.)
We will be changing our code to store multiple underlying data items in one database record, but hopefully there is a short term workaround.
Edit: These are log records being downloaded to another system. We will fix it in the future and know how to do so, but I'd rather work on more important things first.
Try splitting the records on different entity groups. That might force them to go to different physical servers. Read entity groups in parallel from multiple threads or instances.
Using cache mght not work well for large tables.
Maybe you can cache your records, like use Memcache:
https://developers.google.com/appengine/docs/python/memcache/
This could definitely speed up your application access. I don't think that App Engine Datastore is designed for speed but for scalability. Memcache however is.
BTW, if you are conscious about the performance that GAE gives as per what you pay, then maybe you can try setting up your own App Engine cloud with:
AppScale
JBoss CapeDwarf
Both have an active community support. I'm using CapeDwarf in my local environment it is still in BETA but it works.
Move to any of the in-memory databases. If you have Oracle Database, using TimesTen will improve the throughput multifold.