Google Datastore reverse an update or delete? - google-app-engine

Is there any way to reverse or rollback an update or set of updates (or deletions) in Google Cloud Datastore? I suppose the only way is rolling back a transaction or restoring from a backup that was previously made?
Or I suppose you could be clever and make duplicates of entities before updating or deleting them, and then delete these duplicates when a full backup is made?
Thanks!
Alex

There is no way to undo delete or update in the Datastore.
Google Cloud Storage supports object versioning. You may implement similar mechanism in the Datastore by adding "version" and/or "last_update" properties to your entities. Then you can either keep the full history of changes/deletes, or discard some old versions after a certain period of time.

Related

Is google Datastore recommended for storing logs?

I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
5 inserts per entity per seconds - Is a transaction considered a single insert?
Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!
Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance.
Those numbers do not apply but datastore is not at all designed for what you want.
bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.
I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.

AppEngine & BigQuery - Where would you put stat/monitoring data?

I have an AppEngine application that process files from Cloud Storage and inserts them in BigQuery.
Because now and also in the future I would like to know the sanity/performance of the application... I would like to store stats data in either Cloud Datastore or in a Cloud SQL instance.
I have two questions I would like to ask:
Cloud Datastore vs Cloud SQL - what would you use and why? What downsides have you experienced so far?
Would you use a task or direct call to insert data and, also, why? - Would you add a task and then have some consumers insert to data or would you do a direct insert [ regardless of the solution choosen above ]. What downsides have you experienced so far?
Thank you.
Cloud SQL is better if you want to perform JOINs or SUMs later, Cloud Datastore will scale more if you have a lot of data to store. Also, in the Datastore, if you want to update a stats entity transactionally, you will need to shard or you will be limited to 5 updates per second.
If the data to insert is small (one row to insert in BQ or one entity in the datastore) then you can do it by a direct call, but you must accept that the call may fail. If you want to retry in case of failure, or if the data to insert is big and it will take time, it is better to run it asynchronously in a task. Note that with tasks,y you must be cautious because they can be run more than once.

Preventing duplicates with MapReduce to BigQuery pipeline

I was reading the answer by Michael to this post here, which suggests using a pipeline to move data from datastore to cloud storage to big query.
Google App Engine: Using Big Query on datastore?
I want to use this technique to append data to a bigquery table. That means I have to have some way of knowing if the entities have been processed, so they don't get repeatedly submitted to bigquery during mapreduce runs. I don't want to rebuild my table each time.
The way I see it, I have two options. I can put a flag on the entities and update it when each entity is processed and filter it out on subsequent runs - or - I can save each entity to a new table and delete it from the source table. The second way seems superior but I wanted to ask for options or see if there's any gotchas
Assuming you have some stream of activity represented as entities, you can use query cursors to start up one query where a prior one left off. Query cursors are perfect for the type of incremental situation that you've described, because they avoid the overhead for marking entities as having been processed.
I'd have to poke around a bit to see if App Engine MapReduce supports cursors (I suspect that it doesn't, yet).

How to do bulk deletes from Google App Engien Datastore

I need to delete bulk records from datastore, I went through all the previous links but all just talked about fetching the entities from datastore and then deleting them one by one , problem in my case is that I have got around 80K entities and read gets timed out whenever i try to do it using datastore db.delete() method .
Does any one here by any chance know a method more close to SQL to perform a bulk delete ?
You can use Task Queue + DB Cursor for deletion.
Task can be executed up to 10 minutes, it's probably enought time to delete all entities. But if it takes longer, you can get current cursor position, and call task itself one more time, with this cursor as paramtere, and start processing from last position.
Define what API you're using. JDO? GAE? JPA? You refer to some db.delete, yet tag this as JDO; they are not the same. JDO obviously provides pm.deletePersistentAll(), and if wanting more than that you can make use of Google Mapper API
You can use Cloud Dataflow to bulk delete entities in Datastore. You can use a GQL query to select the entities to delete:
https://cloud.google.com/datastore/docs/bulk-delete

Does Active Directory commitChanges method work the same as a DataBase Commit transaction?

I do not have a way to test now so is it possible for you to confirm me the question of the title?
I mean in a ADO.NET database transaction, I can update/insert thousands of records before commiting to the database. In Active Directory using System.Directory.Services it seems I need to commit for every entry (or record) that I update/insert.
Thanks.
Active Directory is not a transactional store - so you don't have the transaction support like you have with a database.
Your observation is absolutely correct - with Active Directory, you deal on a per-object basis; you can retrieve an object, manipulate it, and then save back all the changes (or discard them) - but you don't have any transaction support to roll back a whole series of operations.
If you really must have this capability, you'd have to write your own Resource Manager for AD (see some ideas here in MSDN) - this would allow you to wrap your AD operations in a TransactionScope() and roll them back. I don't think this is a trivial undertaking, otherwise, someone would have done it already....
So your current observations are absolutely correct, and without a whole lot of effort, this cannot be changed, unfortunately.

Resources