I am using objectify on appengine with the java runtime. I am also using memcache via the #Cached annotation for several entities. My question is about the behavior of the objectify when putting objects into the datastore. When putting an entity that has the #Cached annotation, is the memcache updated? Or any existing cached data for that entity now out of sync with the datastore. I would like to have the memcache updated when I put an object into the datastore, however I don't know if objectify does this by default or if I need to write this myself. If the memcache is updated, then I can have a much higher expiration time(or no expiration) for my data. FYI I am not using transactions.
When you use #Cached, Objectify handles all updates to the memcache for you in a near-transactionally safe manner. It's "near-transactional" because while it won't fall apart under contention, there are rare circumstances under which it could go out of sync - for example, if you hit a DeadlineExceededException or OutOfMemoryException and Google terminates your VM.
Long expiration times are reasonable for most types of cached data.
Using Objectify, your data in Memcache will never be out of sync with the Datastore (except in some older versions and in exceptional circumstances, such as a really unlucky deadline).
I Objectify will just invalidate the Memcache version (so the next "get" will go to the Datastore and write to the Memcache), rather than updating it, though I'm not sure about this bit. Either way, Objectify sorts it out for you.
Related
Is it guarantied that if I put entity inside ndb transaction and then read it (in the same ndb txn), I'll get recently written entity?
Here https://cloud.google.com/appengine/docs/java/datastore/transactions#Java_Isolation_and_consistency is mentioned:
Unlike with most databases, queries and gets inside a Datastore
transaction do not see the results of previous writes inside that
transaction. Specifically, if an entity is modified or deleted within
a transaction, a query or get returns the original version of the
entity as of the beginning of the transaction, or nothing if the
entity did not exist then.
So bare Google Datastore (without ndb) wouldn't return recently written entity.
But ndb caches entities (https://cloud.google.com/appengine/docs/python/ndb/transactions):
Transaction behavior and NDB's caching behavior can combine to confuse
you if you don't know what's going on. If you modify an entity inside
a transaction but have not yet committed the transaction, then NDB's
context cache has the modified value but the underlying datastore
still has the unmodified value.
To be a bit more clear I wrote a piece of code which works fine:
#ndb.transactional()
def test():
new_entity = MyModel()
key = new_entity.put()
assert key.get() is not None
test()
And my question is how is reliable ndb caching inside ndb transaction?
Your test will fail if you turn off caching:
class MyModel(ndb.Model):
_use_cache = False
But it's work with context caching enabled.
Just like it's explained in the docs.
It does not makes any sense to write and get the same entity inside of transaction anyway.
NDB Caching article (https://cloud.google.com/appengine/docs/python/ndb/cache#incontext) says:
The in-context cache persists only for the duration of a single
thread. This means that each incoming HTTP request is given a new
in-context cache and and is "visible" only to the code that handles
that request. If your application spawns any additional threads while
handling a request, those threads will also have a new, separate
in-context cache.
The in-context cache is fast; this cache lives in memory. When an NDB
function writes to the Datastore, it also writes to the in-context
cache. When an NDB function reads an entity, it checks the in-context
cache first. If the entity is found there, no Datastore interaction
takes place.
So it is guaranteed that in-context cache will be alive during ndb transaction (if you don't turn off caching deliberately).
I have been using Google App Engine for a few months now and I have recently come to doubt some of my practices with regard to the Datastore. I have around 10 entities with 10-12 properties each. Everything works well in my app and the code is pretty straightforward with the way I have my data structured but I am wondering if I should break up these large entities into smaller ones for either optimization of reads and writes or just to follow best practices (which I am not sure of regarding GAE)
Right now I am over my quotas for reads and writes and would like to keep those in check.
Optimizing Reads:
If you use an offset in a query, the offset entities are counted as reads. If you run a query where offset=100, the datastore retrieves and discards the first 100 entities and you are billed for those reads. Use cursors wherever possible to reduce read ops. Cursors will also result in faster queries.
NDB won't necessarily reduce reads when you are running queries. Queries are made against the datastore and entities are returned, no memcache interaction occurs. If you want to retrieve entities from memcache in the context of a query, you will need to run a keys_only query and then attempt to retrieve those keys from memcache. You would then need to go to the datastore for any entities that were cache misses. Retrieving a key is a "small" op which is 1/7 the cost of a read op.
Optimizing Writes:
Remove unused indexes. By default every property on your entity is indexed and each of those incurs 2 writes the first time it is written and 4 writes whenever it is modified. You can disable indexing for a property like so: firstname = db.StringProperty(indexed=False).
If you use list properties, each item in the list is an individual property on the entity. The list properties are abstractions provided for convenience. A list property named things with the value ["thing1", "thing2"] is really two properties in the datastore: things_0="thing1" and things_1="things". This can get really expensive when combined with indexing.
Consolidate properties that you don't need to query. If you only need to query on one or two properties, serialize the rest of those properties and store it as a blob on the entity.
Further reading:
https://developers.google.com/appengine/docs/billing#Billable_Resource_Unit_Costs
https://developers.google.com/appengine/docs/python/datastore/entities#Understanding_Write_Costs
I would recommend looking into using NDB Entities. NDB will use the in-context cache (and Memcache if need be) before resorting to performing reads/writes to the Datastore. This should help you stay within your quota.
Read here for more information on how NDB uses caching: https://developers.google.com/appengine/docs/python/ndb/cache
And please consult this page for a discussion of best practices with regards to GAE: https://developers.google.com/appengine/articles/scaling/overview
AppEngine Datastore charges a fixed amount per Entity read, no matter how large the Entity is (although there is a max of 1MB). This means it makes sense to combine multiple entities that you ofter read together into a single one. The downside is only that the latency increases (as it needs to deserialize a larger Entity each time). I found this latency to be quite low (low 1 digit ms even for large ones).
The use of frameworks ontop of Datastore is a good idea. I am using Objectify and am very happy. Use the Memcache integration with care though. Googles provides only a fixed limited amount of memory to each application, so as soon as you are talking about larger data this will not solve your problem (since Entities have been evicted from Memcache and need to be re-read from datastore and put into cache again for each read).
So I have been reading a lot of documentation on HRD and NDB lately, yet I still have some doubts regarding how NDB caches things.
Example case:
Imagine a case where a users writes data and the app needs to fetch it immediately after the write. E.g. A user creates a "Group" (similar to a Facebook/Linkedin group) and is redirected to the group immediately after creating it. (For now, I'm creating a group without assigning it an ancestor)
Result:
When testing this sort of functionality locally (having enabled high replication), the immediate fetch of the newly created group fails. A NoneType is returned.
Question:
Having gone through the High Replication docs and Google IO videos, I understand that there is a higher write latency, however, shouldn't NDB caching take care of this? I.e. A write is cached, and then asynchronously actually written on disk, therefore, an immediate read would be reading from cache and thus there should be no problem. Do I need to enforce some other settings?
Pretty sure you are running into the HRD feature where queries are "eventually consistent". NDB's caching has nothing to do with this behavior.
I suspect it might be because of the redirect that the NoneType is returned.
https://developers.google.com/appengine/docs/python/ndb/cache#incontext
The in-context cache persists only for the duration of a single incoming HTTP request and is "visible" only to the code that handles that request. It's fast; this cache lives in memory. When an NDB function writes to the Datastore, it also writes to the in-context cache. When an NDB function reads an entity, it checks the in-context cache first. If the entity is found there, no Datastore interaction takes place.
Queries do not look up values in any cache. However, query results are written back to the in-context cache if the cache policy says so (but never to Memcache).
So you are writing the value to the cache, redirecting it and the read then fails because the HTTP request on the redirect is a different one and so the cache is different.
I'm reaching the limit of my knowledge here but I'd suggest initially that you try the create in a transaction and redirect when complete/success.
https://developers.google.com/appengine/docs/python/ndb/transactions
Also when you put the group model into the datastore you'll get a key back. Can you pass that key (via urlsafe for example) to the redirect and then you'll be guaranteed to retrieve the data as you have it's explicit key? Can't have it's key if it's not in the datastore after all.
Also I'd suggest trying it as is on the production server, sometimes behaviours can be very different locally and on production.
I 've noticed this behavior which results in consecutive gets to succeed.
Has anybody else seen this?
I've found a way to remove one single entity from memcache, painful but it works.
Now, I use Java and Objectify but I hope you'll find this useful, whatever environment and language you use.
Go to the page https://console.cloud.google.com/appengine/memcache for your project.
Enter under Namespace the value "ObjectifyCache", or whatever namespace you use.
Under Key Type, select Java String
This is the tricky bit. Under Key you've got to enter the "URL-safe key" you'll find from the Datastore Edit page for your entity (https://console.cloud.google.com/datastore/entities/edit)
Click on Find and hopefully en entity will appear.
Check the box, and click on DELETE
Now you click on Find again, nothing will come up.
If you're using the high-replication datastore, gets immediately after deletes may succeed and pull up the stale results. It takes up to a few seconds for the results of each operation to appear in the results of other operations.
Memcache is operates independently of the datastore. Some libraries like Objectify connect them,. If you're using Objectify to cache entities and you delete something from outside of Objectify (e.g. the data viewer) you'll have to update your cache yourself. This happens to me occasionally and I just wipe the whole memcache.
You have to find a way to work with this behavior. The simplest (expensive and really slow) method, for example, would just be to wait ten seconds after you do every datastore operation. Better methods might use a cache to return freshly stored or deleted entities.
I am currently studying whether it would be suitable for me to use Ehcache in Google App Engine, and I have one specific need.
I am building a game where the game state would be updated every turn. Currently, after each action I update the Memcache, then the datastore. And each turn, I load the state of the game first from the cache, then from the datastore if the cache is empty.
The reason I have to update the datastore each time is because there is no guarantee that the object won't have been purged from the cache.
My concern is that, most of the time (i.e. as long as the object is not evicted from the cache), all these datastore saves are useless.
I am thus looking for a way to trigger the datastore save only once, before the object is evicted from the cache.
It seems that this is not possible using GAE Memcache. I had a look at Ehcache, but it only provides notifications after the element has been removed. And, as per the documentation, "only what was the key of the element is known", which doesn't go well with what I want to do.
Has anyone already faced the same need? How have you handled it?
Thanks in advance for any hint
No, there's no way to be notified before an element is evicted from cache on App Engine. There's also no way to install an alternate caching system like EHCache.
Memcache is, as the name implies, a caching system. Even with an eviction mechanism, you shouldn't ever rely on it as primary storage.