When to transition from Datastore to NDB? - google-app-engine

From what I have heard, it is better to move to NDB from Datastore. I would be doing that eventually since I hope my website will be performance intensive. The question is when. My project is in its early stages.
Is it better to start in NDB itself? Does NDB take care of Memcache also. So I don't need to have an explict Memcache layer?

NDB provides an automated caching mechanism. See Caching:
NDB automatically caches data that it writes or reads (unless an
application configures it not to). Reading from cache is faster than
reading from the Datastore.
Probably the automatic caching does what you want. The rest of this
page provides more detailed information in case you want to know more
or to control some parts of the caching behavior.
As the documentation says, the default behavior probably does what you want, but you can tweak it if that's not the case. Adding your own memcache layer for the datastore shouldn't be required if you're using NDB.
As for when to migrate, sooner is probably better. The longer you wait the more code you have to rewrite to take advantage of the freebies you get with NDB. For new projects, I would recommend starting with NDB.

To add to Dan's correct answer, remember ndb and the older db are just APIs so you can seamlessly begin switching to ndb without worrying about schema changes etc.. You're question asks about switching from datastore to NDB, but you're not switching from the datastore as NDB still uses the datastore. Make sense?

Related

Does GAE datastore internally use memcache?

As you can see from the attached screenshot, the datastore asks memcache to delete an entry inside a put(). What's that?
At least the ndb datastore caches include memcache:
The pattern you observed could be explained in this section:
Memcache does not support transactions. Thus, an update meant to be
applied to both the Datastore and memcache might be made to only one
of the two. To maintain consistency in such cases (possibly at the
expense of performance), the updated entity is deleted from memcache
and then written to the Datastore.

GAE Memcache Usage for NDB Seems Low

I have a Google App Engine project with a ~40 GB database, and I'm getting poor read performance with NDB. I've noticed that my memcache size (as listed on the dashboard) is only about 2 MB. I would expect NDB to implicitly make more use of memcache to improve performance.
Is there a way of debugging NDB's memcache usage?
The question is rather poorly formulated -- there are a zillion reasons for poor read performance, and most are due to a poorly written app, but you don't tell us anything about the app.
The only question that can be answered is the final one: "Is there a way of debugging NDB's memcache usage?" In addition to Sologoub's pointers I'd suggest using Appstats to find out whether the expected memcache calls are actually being made. And reading the NDB source code, especially get() and put() in context.py, might also help.
This page does a good job explaining how NDB caching works:
https://developers.google.com/appengine/docs/python/ndb/cache
The first place I'd check would be the policy settings to make sure that you are in fact telling NDB to cache what you want cached: https://developers.google.com/appengine/docs/python/ndb/cache#policy_functions

DjangoAppEngine and Eventual Consistency Problems on the High Replication Datastore

I am using djangoappengine and I think have run into some problems with the way it handles eventual consistency on the high application datastore.
First, entity groups are not even implemented in djangoappengine.
Second, I think that when you do a djangoappengine get, the underlying app engine system is doing an app engine query, which are only eventually consistent. Therefore, you cannot even assume consistency using keys.
Assuming those two statements are true (and I think they are), how does one build an app of any complexity using djangoappengine on the high replication datastore? Every time you save a value and then try to get the same value, there is no guarantee that it will be the same.
Take a look in djangoappengine/db/compiler.py:get_matching_pk()
If you do a djangomodel.get() by the pk, it'll translate to a Google App Engine Get().
Otherwise it'll translate to a query. There's room for improvement here. Submit a fix?
Don't really know about djangoappengine but an appengine query if it includes only key is considered a key only query and you will always get consistent results.
No matter what the system you put on top of the AppEngine models, it's still true that when you save it to the datastore you get a key. When you look up an entity via its key in the HR datastore, you are guaranteed to get the most recent results.

Efficient web services using AppEngine

I'm trying to use AppEngine as sort of a RESTful web service. The service is supposed to do simple finds and puts from the Datastore so Objectify seems good for covering that part. It also does a few lookups to other services if data is not available in the Datastore'. I'm usingRedstone XMLRPC` for that part.
Now, I have a couple of questions about how to design the serving part in view of AppEngine' quotas (I guess one should think about efficiency in most case but AppEngine's billing make more people think about efficiency).
First lets consider I use simple Servlets. In this case, I see two options. Either I create a number of servlets each providing a different service with Json passed to each of them or I use a single (or a fewer number of) service and determine the action to perform based on a parameter passed with the Json. Will either design have any significance on the number of hours, etc. clocked by AppEngine?
What is the cost difference if I use a RESTful framework such as Restlet or RestEasy as opposed to the barebones approach ?
This question is something of a follow up to : Creating Java Web Service using Google AppEngine
It's not so important, because most costs are going to datastore, so frontend micro-optimisation doesn't matter.
You can save there may be few cents, by choosing 'simple servet', but... is it your goal? It's much more important to make good data structures, prepare all required data in background, make good caching strategy, etc.
I agree with #Igor.
However, there is an additional thing to consider: http sessions.
GAE supports http sessions. Since GAE is a distributed system, sessions are stored in Datastore (and cached in Memcache for efficient reading). Session is updated in every request (to support expiration), so on every request Datastore is accessed.
Sessions are not required for REST and should be turned off.

Sharing memory-based data in Google App Engine

I'm loosely considering using Google App Engine for some Java server hosting, however I've come across what seems to be a bit of a problem whilst reading some of the docs. Most servers that I've ever written, and certainly the one I have in mind, require some form of memory-based storage that persists between sessions, however GAE seems to provide no mechanism for this.
Data can be stored as static objects but an app may use multiple servers and the data cannot be shared between the servers.
There is memcache, which is shared, but since this is a cache it is not reliable.
This leaves only the datastore, which would work perfectly, but is far too slow.
What I actually need is a high performance (ie. memory-based) store that is accessible to, and consistent for, all client requests. In this case it is to provide a specialized locking and synchronization mechanism that sits in front of the datastore.
It seems to me that there is a big gap in the functionality here. Or maybe I am missing something?
Any ideas or suggestions?
Static data (data you upload along with your app) is visible, read-only, to all instances.
To share data between instances, use the datastore. Where low-latency is important, cache in memcache. Those are the basic options. Reading out of the datastore is pretty fast, it's only writes you'll need to concern yourself with, and those can be mitigated by making sure that any entity properties that you don't need to query against are unindexed.
Another option, if it fits your budget, is to run your own cache in an always-on backend server.

Resources