NDB caching not working on Google App Engine

NDB caching not working on Google App Engine - google-app-engine

I switched to NDB for a new app, which as I understand includes memcache support 'for free'.
So I put an entity in the datastore:
class MyStorage(ndb.Model):
pickled_data = ndb.BlobProperty()
obj = MyStorage(parent=ndb.Key('top_level_key', 'second_level_key'), pickled_data = pickle.dumps(my_attr))
obj.put()
In other requests I then retrieve using
obj = pickle.loads(MyStorage.query(ancestor = ndb.Key('top_level_key', 'second_level_key')).get().pickled_data)
But the delay in testing it when deployed on app engine tells me there's no caching going on (obviously none expected on the first call, but subsequent calls should show a speed up).
I check Memcache Viewer and sure enough, zeroes under every metric. So I'm obviously not getting something regarding free NDB caching. Can someone point out what it is?

NDB will only read from cache when you use .get_by_id() (or .get() on a Key). It won't be used when you use .query().

Related

Way to setup Google Cloud App that creates cache and possible request time out?

I have RSS feed 3rd party website that creates cache which is stored in cache folders. Every request takes 25-40 sec first time, after that it serves from cache for 9-10 mins.
Problem 1: GAE doesn't provide writing to file system. So how should i provide caching?
Problem 2: Request takes 25-40 sec for every time after caching times out. How should i approach this??
Is there any way to sort this out or should i need to use Google Compute Engine which provides both facility??
I read articles about this but no direct answer to my question. Stuck here 2 days before posting here. Thank you.

You can cache on Appengine by simply sending back the necessary cache headers in the HTTP response.
Query the next results outside the user's request using a task queue and prepare the new data to be served in the datastore and/or memcache. Subsequent user requests can then quickly serve the latest data that's available without the setup delay.

Instead of using the file system to store text, use a TextProperty in the NDB datastore. Give it a unique key so you can request it in the future.
This solves your caching problem, too. Requesting an entity by its key will use the built-in cache, or if it's not in the cache it will fetch it from the datastore.
Then add a cron job to update the datastore every ten minutes or so.
class RssFeed(ndb.Model):
KEY = 'RSS'
cache = TextProperty()
last_updated = DateTimeProperty(auto_now=True)
#classmethod
def update(cls):
# insert your code to fetch from the RSS feed
# cache = ...
rss_feed = cls.get_or_insert(key=self.KEY)
rss_feed.cache = cache
rss_feed.save()
#classmethod
def fetch(cls):
rss_feed = cls.get_by_id(self.KEY)
return rss_feed.cache
Call RssFeed.update() to update the cache and RssFeed.fetch() whenever you want the cached data.

Google Cloud Endpoint performance

What is the performance of Google Cloud Endpoints? In my case a large blob is being transferred. Size is anywhere from 1MB to 8MB. It seems to take about 5 to 10 minutes with my broadband speed being about 1Mb upload.
Note this is being done from a Java client calling an endpoint. The object being transferred looks like this:
public class Item
{
String type;
byte[] data;
}
On the java client side, the code looks like this:
Item item = new Item( type, s );
MyItem.Builder builder = new MyItem.Builder( new NetHttpTransport(), new GsonFactory(), null );
service = builder.build();
PutItem putItem = service.putItem( item );
putItem.execute();
Why does it take so long to send one of these up to an endpoint? Is it the JSON parsing that is slowing it down? Any ideas on how to speed this up?

Endpoints is just a wrapper around HTTP requests made to Java servlets (you mention java on client, so I'll make the assumption of java on the server).
This will introduce a very small, fixed, overhead, but the transfer speed of a large blob should be no different whether you are using endpoints or not.
As noted by Gilberto, you should probably consider using Google Cloud Storage (GCS api is slowly replacing the blobstore API). You can use it to upload directly to storage and remove the burden on your GAE app.

Search support for Google App Engine Go runtime

There is search support (experimental) for python and Java, and eventually Go also may supported. Till then, how can I do minimal search on my records?
Through the mailing list, I got an idea about proxying the search request to a python backend. I am still evaluating GAE, and not used backends yet. To setup the search with a python backed, do I have to send all the request (from Go) to data store through this backend? How practical is it, and disadvantages? Any tutorial on this.
thanks.

You could make a RESTful Python app that with a few handlers and your Go app would make urlfetches to the Python app. Then you can run the Python app as either a backend or a frontend (with a different version than your Go app). The first handler would receive a key as input, would fetch that entity from the datastore, and then would store the relevant info in the search index. The second handler would receive a query, do a search against the index, and return the results. You would need a handler for removing documents from the search index and any other operations you want.
Instead of the first handler receiving a key and fetching from the datastore you could also just send it the entity data in the fetch.
You could also use a service like IndexDen for now (especially if you don't have many entities to index):
http://indexden.com/
When making urlfetches keep in mind the quotas currently apply even when requesting URLs from your own app. There are two issues in the tracker requesting to have these quotas removed/increased when communicating with your own apps but there is no guarantee that will happen. See here:
http://code.google.com/p/googleappengine/issues/detail?id=8051
http://code.google.com/p/googleappengine/issues/detail?id=8052

There is full text search coming for the Go runtime very very very soon.

Using Google App Engine NDB and Fixture(s)

Is there good information on how to use the Python fixture module with Google App Engines New DB?
It seems there are a few problems, such as:
obj.delete() on teardown (in ndb it's obj.key.delete())
It is not intuitive how to set up nested StructuredProperty elements.
Are there workarounds to permit the Fixture module to work with ndb, or an alternative fixture system that would work with ndb?
Thank you.

I'm guessing that fixture's GoogleDatastoreFixture class intercepts the Datastore operations at the ext.db module level. Since NDB has a different API it needs changing. Perhaps you can contribute a GoogleNdbFixture class. Or perhaps the right thing to do would be to intercept things at a lower level -- again, something you might take up with fixture's author and see if there's a way you can help.

Did you consider using Testbed? It setups GAE service stubs appropriately, so you can test against datastore (and other services) and it will tear down all your datastore writes after each test.
To create fixtures for your tests, you just directly put some entities into datastore in setUp() method. And you can use NDB API to put fixtures and in tests if you like.

Google app engine and application scope data

I'm developing a spring based web application that is deployed on Google app Engine.
I have a manager that store data in application scope. I'm using a spring bean (singleton) that hold a map and simply perform get and remove from the map,
however GAE is a distributed environment and this design has a problem since each application instance will have its own manager and request are not guaranteed to be made to the same application instance.
So I've looked around and found 2 possible solution:
use data store
use memcache
storing the data will cause a lot of read and writes and eventually I do not need the data to be saved.
second looked promising but Google mentioned that:
In general, an application should not expect a cached value to always be available.
and I need a guarantee that when I ask my manger to get a value it will return it.
is there any other solution?
did I miss anything?

A common solution is storing the values in a memcache, backed by the datastore.
First fetch the application-scope value from the memcache, and if the memcache returns zero-result (a cache-miss event), fetch the value from the datastore and put the fetched value on the memcache.
In this way, the next fetch to the memcache would return your application-scope data, reducing the need for a (relatively) costly read to the datastore.
App Engine memcache is a least-recently used cache, so values that are frequently read would suffer from few cache-miss event.
A more complex solution to have a application-scope value is to store the values in-memory at a resident backend, and have all your other instances request/update the value from/to this particular backend via a servlet handler.