Referring to Memcache Objects And Updating them - google-app-engine

This is a very general question regarding GAE / Java Memcache :
I have an object called Network which is stored in the Memcache , and I have another object called User which has a reference to Network .
Network network=Memcache.get(networkKey);
user._network=network; //user references an object that came from Memcache...
now I have a bunch of code that changes user._network(without touching Memcache) :
user._network= // do some changes to the objects , without touching Memcache
Ok so here's the question ,
Looking at the code as it is , did the last line update the network object in the memcache? I have servlets that need to access the updated network object , the question is if I'm wrong by thinking at Memcache objects as regular objects .
Perhaps the right way is this ?
Network network=Memcache.get(networkKey);
network.doUpdate();
Memcache.put(key,network);

The latter code example you provided is the correct way to update the network object in memcache, but I it will work differently than the way you expect.
Your user object refers to an instance of the network object that has been retrieved from memcache. Each time you retrieve the object from memcache you will get a different instance. And, because the instances are logically distinct from the original within the cache, as soon as the cached value is updated, those instances will become out of sync with the cached version.
For example:
Network network=Memcache.get(networkKey);
Network networkTwo = Memcache.get(networkKey);
user._network=network;
networkTwo.doUpdate();
// network and networkTwo are now different
Memcache.put(networkKey,networkTwo);
// user._network still refers to the initial instance of network, not networkTwo
You'll need additional logic in your code if you want your User objects to always refer to the version of network that is in memcache.

The code does update the Network object in memcache but references to the object in other parts of your application that already have a handle on that Network instance are not updated. Memcache stores SERIALIZED objects.
The best way to use memcache in GAE is in conjunction with something like Objectify for caching datastore objects and for storing large collections of objects that are expensive to generate. It's not well suited to storing objects that change frequently or that are used by many other objects in your application.

Related

Object Store Not getting shared b/w the mule runtimes

I have created two mule runtimes on different windows VM Machines. And Clustered them via Runtime Manager.
Created a Mule App with ObjectStore selecting persistent to true and deploy to cluster. I updated the objectstore value on one server and when i tried to fetch from other its not returning the updated value.
What setting am i missing here ?
FYI: this is on Mule 4.2.2
To be shared between the cluster nodes the object store has to be set to non persistent. A persistent object store usually means it persist to disk. To share in a cluster it needs to be on memory. The cluster backend has a shared memory implementation that will share the object store.
Do not rely on the Object Store. Especially in production.
Use it only for none essential data sharing.
It is easy to be corrupted when multiple requests come from different servers at the same moment.
https://simpleflatservice.com/mule4/IgnoreUnreliableObjectStorage.html

Write/Read with High Replication Datastore + NDB

So I have been reading a lot of documentation on HRD and NDB lately, yet I still have some doubts regarding how NDB caches things.
Example case:
Imagine a case where a users writes data and the app needs to fetch it immediately after the write. E.g. A user creates a "Group" (similar to a Facebook/Linkedin group) and is redirected to the group immediately after creating it. (For now, I'm creating a group without assigning it an ancestor)
Result:
When testing this sort of functionality locally (having enabled high replication), the immediate fetch of the newly created group fails. A NoneType is returned.
Question:
Having gone through the High Replication docs and Google IO videos, I understand that there is a higher write latency, however, shouldn't NDB caching take care of this? I.e. A write is cached, and then asynchronously actually written on disk, therefore, an immediate read would be reading from cache and thus there should be no problem. Do I need to enforce some other settings?
Pretty sure you are running into the HRD feature where queries are "eventually consistent". NDB's caching has nothing to do with this behavior.
I suspect it might be because of the redirect that the NoneType is returned.
https://developers.google.com/appengine/docs/python/ndb/cache#incontext
The in-context cache persists only for the duration of a single incoming HTTP request and is "visible" only to the code that handles that request. It's fast; this cache lives in memory. When an NDB function writes to the Datastore, it also writes to the in-context cache. When an NDB function reads an entity, it checks the in-context cache first. If the entity is found there, no Datastore interaction takes place.
Queries do not look up values in any cache. However, query results are written back to the in-context cache if the cache policy says so (but never to Memcache).
So you are writing the value to the cache, redirecting it and the read then fails because the HTTP request on the redirect is a different one and so the cache is different.
I'm reaching the limit of my knowledge here but I'd suggest initially that you try the create in a transaction and redirect when complete/success.
https://developers.google.com/appengine/docs/python/ndb/transactions
Also when you put the group model into the datastore you'll get a key back. Can you pass that key (via urlsafe for example) to the redirect and then you'll be guaranteed to retrieve the data as you have it's explicit key? Can't have it's key if it's not in the datastore after all.
Also I'd suggest trying it as is on the production server, sometimes behaviours can be very different locally and on production.

Is there an App Engine shared memory or equivalent solution?

I'm building a GAE app that requires a cryptographic key to operate. I would like to avoid storing the key in code or in a persistent datastore, and instead upload the key whenever I start my app so that it will only reside in memory for the duration of the app's lifetime (from the time I upload the key until no instances are running.)
I understand that this is possible to do with a resident backend, but this seems too expensive (cheapest backend is currently 58$/month) just to keep one value in memory and serve it to other instances on demand.
Note that I'm not looking for a general robust shared-memory solution, just one value that is basically written once and read many times. Thanks.
I don't think that this can work the way you hope. The sources of data in GAE:
Files deployed with your app (war or whatever).
Per-instance memory (front-end or back-end).
Memcache.
Datastore (or SQL now, I suppose).
Blobstore.
Information retrieved via http requests (i.e. store it somewhere else).
1 and 4 are out, as per your question. 2 doesn't work by itself because the starting and stopping of instances is out of your control (it wouldn't scale otherwise), and persistent instances are expensive. 3 doesn't work by itself because Memcache can be cleared at any time. 5 is really no different than the datastore, as it is permanently stored on the Google's servers. Maybe you could try 6 (store it somewhere else), and retrieve it into per-instance memory during the instance startup. But I suspect that is no better security-wise (and, for that matter, doesn't match with what you said that you wanted).
It seems that a Memcache and local memory solution might work if you:
have your server instances clear the memcached key on exit and
existing server instances write/refresh the key regularly (for
example on every request).
That way the key will likely be there as long as an instance is operational and most likely not be there when starting up cold.
The same mechanism could also be used to propagate a new key and/or cycle server instances in the event of a key change.

Can I register an Ehcache event listener that triggers before the object is deleted?

I am currently studying whether it would be suitable for me to use Ehcache in Google App Engine, and I have one specific need.
I am building a game where the game state would be updated every turn. Currently, after each action I update the Memcache, then the datastore. And each turn, I load the state of the game first from the cache, then from the datastore if the cache is empty.
The reason I have to update the datastore each time is because there is no guarantee that the object won't have been purged from the cache.
My concern is that, most of the time (i.e. as long as the object is not evicted from the cache), all these datastore saves are useless.
I am thus looking for a way to trigger the datastore save only once, before the object is evicted from the cache.
It seems that this is not possible using GAE Memcache. I had a look at Ehcache, but it only provides notifications after the element has been removed. And, as per the documentation, "only what was the key of the element is known", which doesn't go well with what I want to do.
Has anyone already faced the same need? How have you handled it?
Thanks in advance for any hint
No, there's no way to be notified before an element is evicted from cache on App Engine. There's also no way to install an alternate caching system like EHCache.
Memcache is, as the name implies, a caching system. Even with an eviction mechanism, you shouldn't ever rely on it as primary storage.

When to use a certain type of persistence in Google App Engine?

First of all I'll explain the question. By persistence, I mean storing data beyond the execution of a single request. It might not be the best question title, so feel free to edit it.
The way I see it, there are three types of persistence in GAE, each one "closer" to the request itself:
The datastore
This is where all data is most likely to be based. It may go into the higher layers of persistence temporarily, but in the end, this is where the data really is. Unfortunately, querying the datastore repeatedly is slow and uses a lot of resources.
Use when...
storing data that should be stored for an indefinite amount of time.
Avoid using when...
getting data that is queried often but rarely updated.
memcache
This is a highly complex caching engine that stores the data in memory and makes sure all users read from/write to the same cache. It's a much faster way to get/set data on a key→value basis than using the datastore. Unfortunately, data can only stay in the memory for so long, and there is no guarantee that it will stay for as long as you tell it to; the data may disappear at any time if memory is needed elsewhere.
Use when...
you need to get data more often than you need to update it. Even when data needs to be updated often, it can have its uses (if a few missed updates are considered okay), by setting up a task queue to persist data from the memcache to the datastore.
Avoid using when...
data needs to be updated often and has to be up-to-date when fetched.
Global variables
This isn't an official method of persisting data, but it works. However, it's the least reliable method, and since it has no data synchronization across servers, persisted data may show up differently for different users (but from what I've found, the server rarely changes for the same user.) Theoretically, this should be the method that has the least overhead in getting/setting values, however, and could have its uses.
Use when...
hell freezes over? I don't know... I haven't enough knowledge about what goes on behind the scenes to actually rely on this method. Discuss!
Avoid using when...
you rely on the data being the same across servers.
Cookies
If the data is user-specific, it can be efficient to store it as a cookie in the user's browser. There are some pitfalls to watch out for though:
Security – the user can meddle with cookies, and malicious people could potentially do the same. To make sure that the contents are unreadable and unchangeable to all, the cookie can be encrypted using the PyCrypto library which is available on GAE.
Performance – since cookies are sent with every request (even images), it can add to the bandwidth being used, and slow down requests. One solution is to use another domain for static content, so the browser won't send the cookie for that content.
When should the different types of persistence be used? How can they be combined to reduce/even out the amount of resources being spent?
Datastore
Use the datastore to hold any long living information. The datastore should be used like you would use a normal database to hold data that will be used in your site/application.
MemCache
Use this to access data a lot quicker than trying to access the datastore. MemCache can return data really quickly and can be used for any data that needs to span multiple calls from users. It is normally data that was originally in the datastore and then moved to the memcache.
def get_data():
data = memcache.get("key")
if data is not None:
return data
else:
data = self.query_for_data() #get data from the datastore
memcache.add("key", data, 60)
return data
The memcache will flush itself when the item is out of date. You set this in the last param of the add shown above.
Global Variables
I wouldn't use these at all since they can't span instances. In GAE a request creates a new instance, well in python it does. If you want to use Global variables I would store the data needed in the memcache.
Your post is a good summary of the 3 major options. You mostly have answered the question already. However, if you are currently building an app and stressing over whether or not you should memcache something, try this:
Write your app using the datastore for everything that needs to outlive more than one request.
Once your app (or some usable subset) is working, run some functional tests or simulations to see where the slow spots (or high quota usage) are.
Find the most slow or inefficient request path, and figure out how to make that faster (either by using memcache, or altering your datastructures so you can do gets instead of queries, or possibly storing something in a global instance variable*)
goto 2 until you're satisfied.
*Things that might be good for a "global" variable would be something that is relatively expensive to create/fetch, that a substantial portion of your requests will use, and that does not need to be consistent across requests/users.
I use global variable to speed up json conversion. Before I convert my data structure to json, I hash it and check if the json if already available. For my app this gives quite a speedup as the pure python implementation is quite slow.
Global variables
To complement AutomatedTester's answer, and also reply his further question about how to share information between GETs without memcache or datastore, below a quick illustration of how to use global variables:
if 'i' not in globals():
i = 0
def main():
global i
i += 1
print 'Status: 200'
print 'Content-type: text/plain\n'
print i
if __name__ == '__main__':
main()
Calling this script multiple times will give you 1, 2, 3... Of course as mentioned earlier by Blixt you should not count on this trick too much ('i' can sometimes switch back to zero) but it can be useful to store user-specific information in a dictionary, session data for instance.

Resources