Modifying and reading entity inside ndb transaction - google-app-engine

Is it guarantied that if I put entity inside ndb transaction and then read it (in the same ndb txn), I'll get recently written entity?
Here https://cloud.google.com/appengine/docs/java/datastore/transactions#Java_Isolation_and_consistency is mentioned:
Unlike with most databases, queries and gets inside a Datastore
transaction do not see the results of previous writes inside that
transaction. Specifically, if an entity is modified or deleted within
a transaction, a query or get returns the original version of the
entity as of the beginning of the transaction, or nothing if the
entity did not exist then.
So bare Google Datastore (without ndb) wouldn't return recently written entity.
But ndb caches entities (https://cloud.google.com/appengine/docs/python/ndb/transactions):
Transaction behavior and NDB's caching behavior can combine to confuse
you if you don't know what's going on. If you modify an entity inside
a transaction but have not yet committed the transaction, then NDB's
context cache has the modified value but the underlying datastore
still has the unmodified value.
To be a bit more clear I wrote a piece of code which works fine:
#ndb.transactional()
def test():
new_entity = MyModel()
key = new_entity.put()
assert key.get() is not None
test()
And my question is how is reliable ndb caching inside ndb transaction?

Your test will fail if you turn off caching:
class MyModel(ndb.Model):
_use_cache = False
But it's work with context caching enabled.
Just like it's explained in the docs.
It does not makes any sense to write and get the same entity inside of transaction anyway.

NDB Caching article (https://cloud.google.com/appengine/docs/python/ndb/cache#incontext) says:
The in-context cache persists only for the duration of a single
thread. This means that each incoming HTTP request is given a new
in-context cache and and is "visible" only to the code that handles
that request. If your application spawns any additional threads while
handling a request, those threads will also have a new, separate
in-context cache.
The in-context cache is fast; this cache lives in memory. When an NDB
function writes to the Datastore, it also writes to the in-context
cache. When an NDB function reads an entity, it checks the in-context
cache first. If the entity is found there, no Datastore interaction
takes place.
So it is guaranteed that in-context cache will be alive during ndb transaction (if you don't turn off caching deliberately).

Related

Does the Google Cloud Datastore create transactions implicitly?

In many databases, when an operation is performed without explicitly starting a transaction, the database creates a new transaction implicitly.
Does the datastore do this?
If it does not, is there any model for reasoning about how the data changes in the absence of transactions? How do puts, fetches, and reads, work outside of transactions?
If it does, is there any characterization for when and how. Does it do it always? What is the scope of the transaction?
A mutation (put, delete) of a single entity will always be atomic (succeed entirely or fail entirely). You can think of the single mutation as transactional, even if you did not provide a transaction.
However, if you send multiple mutations in the same non-transactional request, that overall request is not atomic. Each mutation may succeed or fail independently -- one failure will not cause the other mutations to be reverted.
"Transactions are an optional feature of the Datastore; you're not required to use transactions to perform Datastore operations."
so there are no automatic transactions being opened for you across more than a single entity datastore operation.
a single entity commit will behave the same as a transaction internally. so if you are changing more than one entity or committing it more than once, its as if you open and close a transaction every time.

Use of memcache with NDB

I'm going from db to ndb and I have a little doubt.
I've read the Caching docs but I need a clarification or a confirmation: with db I was using memcache to save my "views" and avoid hitting the datastore. NDB "caches" the entities reads and writes, but, also any read I can do?
E.g.: items = Item.query().fetch(100) gives me my items. This query is cached by NDB automatically? If two users wants to see the items in my list, the second read will hit the NDB cache?
NDB only caches get() calls, where you fetch entities by key.
Queries are not cached.

How do I ensure consistency between MemCache and Datastore on GAE?

I am writing multiple entites to the datastore using a transaction. I want to keep these entities in MemCache, also. How do I ensure that the copy of the entity in MemCache actually equals the copy in Datastore?
E.g. I can do:
tx.begin()
datastore.put(entity)
if (memcache.putIfUntoched(key, entity))
tx.commit()
But then if the transaction fails the entity will possibly end up in the MemCache but not in the Datastore. On the other hand, if I do:
tx.begin()
datastore.put(entity)
tx.commit()
memcache.putIfUntoched(key, entity))
then the Datastore transaction might succeed but the MemCache update could fail. How can I ensure consistency?
From my experience, it may not be that helpful if you write to the DB and the cache at the same time. In general, mixing DB transactions with other stuffs (e.g. file system) is difficult to do it right.
I suggest you change your program logic, so that
When you create a new record, write only to DB
When you update an existing record, write to DB, and invalidate corresponding slots in cache
When you're looking for a record, just check the cache. If it's not there, load from DB and fill in the cache

Write/Read with High Replication Datastore + NDB

So I have been reading a lot of documentation on HRD and NDB lately, yet I still have some doubts regarding how NDB caches things.
Example case:
Imagine a case where a users writes data and the app needs to fetch it immediately after the write. E.g. A user creates a "Group" (similar to a Facebook/Linkedin group) and is redirected to the group immediately after creating it. (For now, I'm creating a group without assigning it an ancestor)
Result:
When testing this sort of functionality locally (having enabled high replication), the immediate fetch of the newly created group fails. A NoneType is returned.
Question:
Having gone through the High Replication docs and Google IO videos, I understand that there is a higher write latency, however, shouldn't NDB caching take care of this? I.e. A write is cached, and then asynchronously actually written on disk, therefore, an immediate read would be reading from cache and thus there should be no problem. Do I need to enforce some other settings?
Pretty sure you are running into the HRD feature where queries are "eventually consistent". NDB's caching has nothing to do with this behavior.
I suspect it might be because of the redirect that the NoneType is returned.
https://developers.google.com/appengine/docs/python/ndb/cache#incontext
The in-context cache persists only for the duration of a single incoming HTTP request and is "visible" only to the code that handles that request. It's fast; this cache lives in memory. When an NDB function writes to the Datastore, it also writes to the in-context cache. When an NDB function reads an entity, it checks the in-context cache first. If the entity is found there, no Datastore interaction takes place.
Queries do not look up values in any cache. However, query results are written back to the in-context cache if the cache policy says so (but never to Memcache).
So you are writing the value to the cache, redirecting it and the read then fails because the HTTP request on the redirect is a different one and so the cache is different.
I'm reaching the limit of my knowledge here but I'd suggest initially that you try the create in a transaction and redirect when complete/success.
https://developers.google.com/appengine/docs/python/ndb/transactions
Also when you put the group model into the datastore you'll get a key back. Can you pass that key (via urlsafe for example) to the redirect and then you'll be guaranteed to retrieve the data as you have it's explicit key? Can't have it's key if it's not in the datastore after all.
Also I'd suggest trying it as is on the production server, sometimes behaviours can be very different locally and on production.

Appengine memcache expiration with Objectify

I am using objectify on appengine with the java runtime. I am also using memcache via the #Cached annotation for several entities. My question is about the behavior of the objectify when putting objects into the datastore. When putting an entity that has the #Cached annotation, is the memcache updated? Or any existing cached data for that entity now out of sync with the datastore. I would like to have the memcache updated when I put an object into the datastore, however I don't know if objectify does this by default or if I need to write this myself. If the memcache is updated, then I can have a much higher expiration time(or no expiration) for my data. FYI I am not using transactions.
When you use #Cached, Objectify handles all updates to the memcache for you in a near-transactionally safe manner. It's "near-transactional" because while it won't fall apart under contention, there are rare circumstances under which it could go out of sync - for example, if you hit a DeadlineExceededException or OutOfMemoryException and Google terminates your VM.
Long expiration times are reasonable for most types of cached data.
Using Objectify, your data in Memcache will never be out of sync with the Datastore (except in some older versions and in exceptional circumstances, such as a really unlucky deadline).
I Objectify will just invalidate the Memcache version (so the next "get" will go to the Datastore and write to the Memcache), rather than updating it, though I'm not sure about this bit. Either way, Objectify sorts it out for you.

Resources