GAE datastore cached entities under Python objects - google-app-engine

I have this code here: http://sprunge.us/TAjH?py
Why am I getting 10 instead of 1 (or what is in the DB)? Also, any new retrieved python object (as entity) has the very same ID (and maybe even memory address). Why that? How should I proceed in order to have different objects with the real value stored under the DB?

Answered it here: How do I refresh an NDB entity from the datastore?
So the use_cache=False is the solution, while retrieving from the datastore.

Related

how to keep memcache and datastore in sync

suppose I have million users registered with my app. now there's a new user, and I want to show him who all in his contacts have this app installed. A user can have many contacts, let's say 500. now if I go to get an entity for each contact from datastore then it's very time and money consuming. memcache is a good option, but I've to keep it in sync for that Kind. I can get dedicated memcache for such a large data, but how do I sync it? my logic would be, if it's not there in memcache, assume that that contact is not registered with this app. A backend module with manual scaling can be used to keep both in sync. But I don't know how good this design is. Any help will be appreciated.
This is not how memcache is designed to be used. You should never rely on memcache. Keys can drop at any time. Therefore, in your case, you can never be sure if a contact exists or not.
I don't know what your problem with datastore is? Datastore is designed to read data very fast - take advantage of it.
When new users install your app, create a lookup entity with the phone number as the key. You don't necessarily need any other properties. Something like this:
Entity contactLookup = new Entity("ContactLookup", "somePhoneNumber");
datastore.put(contactLookup);
That will keep a log of who's got the app installed.
Then, to check which of your users contacts are already using your app, you can create an array of keys out of the phone numbers from the users address book (with their permission of course!), and perform a batch get. Something like this:
Set<Key> keys = new HashSet<Key>();
for (String phoneNumber : phoneNumbers)
keys.add(KeyFactory.createKey("ContactLookup", phoneNumber));
Map<Key, Entity> entities = datastore.get(keys);
Now, entities will be those contacts that have your app installed.
You may need to batch the keys to reduce load. The python api does this for you, but not sure about the java apis. But even if your users has 500 contacts, it's only 5 queries (assuming batches of 100).
Side note: you may want to consider hashing phone numbers for storage.
Memcache is a good option to reduce costs and improve performance, but you should not assume that it is always available. Even a dedicated Memcache may fail or an individual record can be evicted. Besides, all this synchronization logic will be very complicated and error-prone.
You can use Memcache to indicate if a contact is registered with the app, in which case you do not have to check the datastore for that contact. But I would recommend checking all contacts not found in Memcache in the Datastore.
Verifying if a record is present in a datastore is fast and inexpensive. You can use .get(java.lang.Iterable<Key> keys) method to retrieve the entire list with a single datastore call.
You can further improve performance by creating an entity with no properties for registered users. This way there will be no overhead in retrieving these entities.
Since you don't use python and therefore don't have access to NDB, the suggestion would be to, when you add a user, add him to memcache and create an async query (or a task queue job) to push the same data to your datastore. Like that memcache gets pushed first, and then eventually the datastore follows. They'll always be in sync.
Then all you need to do is to first query your memcache when you do "gets" (because memcache is always in sync since you push there first), and if memcache returns empty (being volatile and whatnot), then query the actual datastore to "re fill" memcache

Does Objectify cache the Query.fetchKeys results in memcache?

I have limited seed-data in an entity, for which I want to fetch all keys (unique strings) frequently and whole entity not that frequent.
If I fetch the keys using Query.fetchKeys, does Objectify cache the results in memcache or hit datastore everytime for Query.fetchKeys results?
Query.fetchKeys() is a method from a very old version of Objectify.
But in answer to your question, all 'queries' (that is, anything besides get-by-key) must pass through to the datastore. Only the datastore knows what satisfies a query.

How to clean up GAE production datastore?

Is there any effective way (in terms of number or read/write operations) to:
delete all NDB datastore records of particular kind;
delete everything in the datastore?
ndb.delete_multi(
MyModel.query().fetch(keys_only=True)
)
You need to do this for each model separately.
--OR--
If you have Datastore Admin enabled in your developer console, your can do this directly for all entities of any or all Kinds.
The remote API is great for this sort of operation. See the article below, it even includes an example for deleting all entities of a given kind.
https://developers.google.com/appengine/articles/remote_api

REST status code and eventual consistency?

I have a RESTful web service that runs on the Google App Engine, and uses JPA to store entities in the GAE Data Store.
New entities are created using a POST request (as the server will generate the entity ID).
However, I am uncertain as to the best status code to return, as the GAE DS is eventual consistent. I have considered the following:
200 OK: RFC states that the response body should contain “an entity describing or containing the result of the action”. This is achievable as the entity is updated with it's generated ID when it is persisted to the DS, therefore it is possible to serialize and return the updated entity straight away. However, subsequent GET requests for that entity by ID may fail as all nodes may not yet have reached consistency (this has been observed as a real world problem for my client application).
201 Created: As above, returning a URI for the new entity may cause the client problems if consistency has not yet been reached.
202 Accepted: Would eliminate the problems discussed above, but would not be able to inform the client of the ID of the new entity.
What would be considered best practice in this scenario?
A get by key will always be consistent, so a 200 response would be Ok based on your criteria unless there is a problem in google land. Are you certain you observed problems are from gets rather than queries. There is a difference between a query selecting a KEY vs a GET by key.
For a query to be consistent it must be an ancestor query, alternately a GET is consistent, anything else may see inconsistent data as indexes have yet to be updated.
This is all assuming there isn't an actual problem in google land. We have seen problems in the past, where datacenters where late replicating, and eventual consistancy was very late, sometimes even hours.
But you have no way of knowing that, so you either have to assume all is OK, or take an extremely pessimistic approach.
It depends on which JSON REST Protocoll you are using. Just always returning a json Object is not very RESTful.
You should look at some of these:
jsonapi.org/format/
http://stateless.co/hal_specification.html
http://amundsen.com/media-types/collection/
To answer you Question:
I would prefer using a format, where the Resource itself is aware of it's URL, so I would use 201 but return also the whole ressource.
The easiest way would be be to use jsonapi with a convenious url schema, so you are able to find a ressource by url because you know the id.

Visual understanding of the GAE datastore

I am trying to understand how the Google App Engine (GAE) datastore is designed and how to use it. I am having a bit of a hard time to visualise the structure from the description at the getting started page.
Can somebody explain the datastore using figures for us visually oriented people? Or point to a good tutorial again with visual learning in mind?
I am specifically looking for answers with diagrams/figures that explains how GAE is used.
The 2008 IO session "Under the Covers of the Google App Engine Datastore" has a good visual overview of the datastore.
https://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore
http://snarfed.org/datastore_talk.html
For more IO talks go to:
https://developers.google.com/appengine/docs/videoresources
Very simplified I've understood that GAE can be viewed as a hashmap of hashmaps.
That said you could view it like this:
I guess there's no correct answer here, just different mind models. Depending on your programming background you may find mine enlightning, disturbing or both. I picture the datastore as a single huge distributed key-value collection of buckets that comprises all entity data of any kind in any namespace and all GAE apps of all users. A single bucket is called an entity group. It has a root key which (under the hood) consists of your appID, a namespace, a kind, an entity ID or name. In an entity group resides one ore more entities which have keys extending the root key. The entity belonging to the root key itself may or may not exist. Operations within a single entity group are atomic (transactional). An entity is a simple map-like datastructure. The 2 built-in indexes (ascending and descending) again are 2 giant sorted collections of index entries. Each index entry is a datastructure of appID,namespace,kind,property name,property type,property value,entity key - in that order.
Each (auto-)indexed value of each property of each entity creates 2 such index entries. There's another index with just entity keys in it. Custom indexes however go to yet another sorted collection with entries containing appID,namespace,index type,combined index value, entity key. That's the only part of the whole datastore that uses meta-data. It stores an index definition which tells the store how the combined index value is formed from the entity. This is the picture that's burnt into my mind and from which I know how to make the datastore happy.

Resources