How to clean up GAE production datastore? - google-app-engine

Is there any effective way (in terms of number or read/write operations) to:
delete all NDB datastore records of particular kind;
delete everything in the datastore?

ndb.delete_multi(
MyModel.query().fetch(keys_only=True)
)
You need to do this for each model separately.
--OR--
If you have Datastore Admin enabled in your developer console, your can do this directly for all entities of any or all Kinds.

The remote API is great for this sort of operation. See the article below, it even includes an example for deleting all entities of a given kind.
https://developers.google.com/appengine/articles/remote_api

Related

Moving designs docs,views and queries from one database to another in cloudant

I have queries/views/design docs in cloudant. I want to move them to another database. Is there any way?
Cloudant/CouchDB replication will copy all of the documents (including design documents) to the target databases, as long as the user being used to authenticate against the target database has 'admin' access - because you need admin access to write a design document.
There's no built-in way with Cloudant. If you want to write a NodeJS program, it's pretty easy. Here's a gist that deletes all documents except design docs. It could quickly be modified to copy the design docs to a different database. https://gist.github.com/rajrsingh/6044d58e2ae743d7ec5b

Does GAE datastore internally use memcache?

As you can see from the attached screenshot, the datastore asks memcache to delete an entry inside a put(). What's that?
At least the ndb datastore caches include memcache:
The pattern you observed could be explained in this section:
Memcache does not support transactions. Thus, an update meant to be
applied to both the Datastore and memcache might be made to only one
of the two. To maintain consistency in such cases (possibly at the
expense of performance), the updated entity is deleted from memcache
and then written to the Datastore.

how to keep memcache and datastore in sync

suppose I have million users registered with my app. now there's a new user, and I want to show him who all in his contacts have this app installed. A user can have many contacts, let's say 500. now if I go to get an entity for each contact from datastore then it's very time and money consuming. memcache is a good option, but I've to keep it in sync for that Kind. I can get dedicated memcache for such a large data, but how do I sync it? my logic would be, if it's not there in memcache, assume that that contact is not registered with this app. A backend module with manual scaling can be used to keep both in sync. But I don't know how good this design is. Any help will be appreciated.
This is not how memcache is designed to be used. You should never rely on memcache. Keys can drop at any time. Therefore, in your case, you can never be sure if a contact exists or not.
I don't know what your problem with datastore is? Datastore is designed to read data very fast - take advantage of it.
When new users install your app, create a lookup entity with the phone number as the key. You don't necessarily need any other properties. Something like this:
Entity contactLookup = new Entity("ContactLookup", "somePhoneNumber");
datastore.put(contactLookup);
That will keep a log of who's got the app installed.
Then, to check which of your users contacts are already using your app, you can create an array of keys out of the phone numbers from the users address book (with their permission of course!), and perform a batch get. Something like this:
Set<Key> keys = new HashSet<Key>();
for (String phoneNumber : phoneNumbers)
keys.add(KeyFactory.createKey("ContactLookup", phoneNumber));
Map<Key, Entity> entities = datastore.get(keys);
Now, entities will be those contacts that have your app installed.
You may need to batch the keys to reduce load. The python api does this for you, but not sure about the java apis. But even if your users has 500 contacts, it's only 5 queries (assuming batches of 100).
Side note: you may want to consider hashing phone numbers for storage.
Memcache is a good option to reduce costs and improve performance, but you should not assume that it is always available. Even a dedicated Memcache may fail or an individual record can be evicted. Besides, all this synchronization logic will be very complicated and error-prone.
You can use Memcache to indicate if a contact is registered with the app, in which case you do not have to check the datastore for that contact. But I would recommend checking all contacts not found in Memcache in the Datastore.
Verifying if a record is present in a datastore is fast and inexpensive. You can use .get(java.lang.Iterable<Key> keys) method to retrieve the entire list with a single datastore call.
You can further improve performance by creating an entity with no properties for registered users. This way there will be no overhead in retrieving these entities.
Since you don't use python and therefore don't have access to NDB, the suggestion would be to, when you add a user, add him to memcache and create an async query (or a task queue job) to push the same data to your datastore. Like that memcache gets pushed first, and then eventually the datastore follows. They'll always be in sync.
Then all you need to do is to first query your memcache when you do "gets" (because memcache is always in sync since you push there first), and if memcache returns empty (being volatile and whatnot), then query the actual datastore to "re fill" memcache

DjangoAppEngine and Eventual Consistency Problems on the High Replication Datastore

I am using djangoappengine and I think have run into some problems with the way it handles eventual consistency on the high application datastore.
First, entity groups are not even implemented in djangoappengine.
Second, I think that when you do a djangoappengine get, the underlying app engine system is doing an app engine query, which are only eventually consistent. Therefore, you cannot even assume consistency using keys.
Assuming those two statements are true (and I think they are), how does one build an app of any complexity using djangoappengine on the high replication datastore? Every time you save a value and then try to get the same value, there is no guarantee that it will be the same.
Take a look in djangoappengine/db/compiler.py:get_matching_pk()
If you do a djangomodel.get() by the pk, it'll translate to a Google App Engine Get().
Otherwise it'll translate to a query. There's room for improvement here. Submit a fix?
Don't really know about djangoappengine but an appengine query if it includes only key is considered a key only query and you will always get consistent results.
No matter what the system you put on top of the AppEngine models, it's still true that when you save it to the datastore you get a key. When you look up an entity via its key in the HR datastore, you are guaranteed to get the most recent results.

How to create database table in Google App Engine

How to create database table in Google App Engine
You don't. You create Entities of different kinds. Datastore is not a relational database[*].
If you want to imagine that GAE creates one "table" for each kind, the "columns" of that "table" being the properties of the entities, then you're welcome to do so. But I don't think it helps.
[*] I don't know whether it meets some technical definition, but it certainly doesn't drive like SQL-based databases.
According to http://code.google.com/appengine/docs/python/datastore/
App Engine Datastore is a schemaless object datastore providing
robust, scalable storage for your web application, with the following
features:
No planned downtime
Atomic transactions
High availability of reads and writes
Strong consistency for reads and ancestor queries
Eventual consistency for all other queries
The Python Datastore interface includes a rich data modeling API and a SQL-like query language called GQL.
In simple words just create you model class, create an object of this class and after first call of put() method for this object the "table"(I think the term here is kind) will be created on the fly. But you definitely have to read the documentation and check some examples. The will help you to understand the specifics of Google Datastore and how it differs from the common RDBMS
In simple words, i would say that with Google BigTable you don't need to create your tables because there are already six Big Tables ready to store whatever you want.

Resources