I have a Google App Engine project with a ~40 GB database, and I'm getting poor read performance with NDB. I've noticed that my memcache size (as listed on the dashboard) is only about 2 MB. I would expect NDB to implicitly make more use of memcache to improve performance.
Is there a way of debugging NDB's memcache usage?
The question is rather poorly formulated -- there are a zillion reasons for poor read performance, and most are due to a poorly written app, but you don't tell us anything about the app.
The only question that can be answered is the final one: "Is there a way of debugging NDB's memcache usage?" In addition to Sologoub's pointers I'd suggest using Appstats to find out whether the expected memcache calls are actually being made. And reading the NDB source code, especially get() and put() in context.py, might also help.
This page does a good job explaining how NDB caching works:
https://developers.google.com/appengine/docs/python/ndb/cache
The first place I'd check would be the policy settings to make sure that you are in fact telling NDB to cache what you want cached: https://developers.google.com/appengine/docs/python/ndb/cache#policy_functions
Related
I am new to Google Cloud Datastore. I have read that there is a write limit on 1 write per second on an entity group. Does it means that the main "guestbook" tutorial on app engine cannot scale to thousands of very active users?
Indeed.
The tutorial is just a showcase. The writes per second limitation is due to the strong consistency guarantee for entities in the same group or ancestor. This limit can be exceed at the price of changing strong consistency by eventual consistency, meaning all datastore queries will show the same information at some point. This is due to App engine distributed design.
Please, have a look at https://cloud.google.com/appengine/articles/scaling/contention to avoid datastore contention issues. Hope it helps.
Yes, I think it does mean that.
It might not be a problem if the greetings are all added to different guestbooks, but quickly adding Greetings to the same Guestbook is definitely not gonna scale. However, in practice it's often much faster than 1 write per second.
Perhaps you could work around this by using a taskqueue to add Greetings, but that might be overkill.
That guestbook tutorial is not a very good example in general. You shouldn't put logic in your jsp's like in that example (you probably shouldn't use jsp's at all). It's also not very practical to use the datastore at such a low level. Just use Objectify.
From what I have heard, it is better to move to NDB from Datastore. I would be doing that eventually since I hope my website will be performance intensive. The question is when. My project is in its early stages.
Is it better to start in NDB itself? Does NDB take care of Memcache also. So I don't need to have an explict Memcache layer?
NDB provides an automated caching mechanism. See Caching:
NDB automatically caches data that it writes or reads (unless an
application configures it not to). Reading from cache is faster than
reading from the Datastore.
Probably the automatic caching does what you want. The rest of this
page provides more detailed information in case you want to know more
or to control some parts of the caching behavior.
As the documentation says, the default behavior probably does what you want, but you can tweak it if that's not the case. Adding your own memcache layer for the datastore shouldn't be required if you're using NDB.
As for when to migrate, sooner is probably better. The longer you wait the more code you have to rewrite to take advantage of the freebies you get with NDB. For new projects, I would recommend starting with NDB.
To add to Dan's correct answer, remember ndb and the older db are just APIs so you can seamlessly begin switching to ndb without worrying about schema changes etc.. You're question asks about switching from datastore to NDB, but you're not switching from the datastore as NDB still uses the datastore. Make sense?
I have seen a little of this in stack overflow but I am wondering if there is any reason to use the DB entity model and what the specific pros and cons of using on or the other are.
I have read the ndb is a little faster and that it helps with caching. They have a good bit of info in the docs but don't really straight out say that ndb is better. At least I haven't found that yet.
As far as I can tell ndb is an evolution of db, kept seperate to maintain compatability.
Have a look at the cheat sheet, it details the main differences
https://docs.google.com/document/d/1AefylbadN456_Z7BZOpZEXDq8cR8LYu7QgI7bt5V0Iw/mobilebasic
But it does not mention the other features such as computed properties.
If you are starting a new project I see no reason not to use ndb and every reason to.
EDIT: Alt link for document: https://docs.google.com/document/d/1AefylbadN456_Z7BZOpZEXDq8cR8LYu7QgI7bt5V0Iw/edit#
I'm thinking of porting an application from RoR to Python App Engine that is heavily geo search centric. I've been using one of the open source GeoModel (i.e. geohashing) libraries to allow the application to handle queries that answer questions like "what restaurants are near this point (lat/lng pair)" and things of that nature.
GeoModel uses a ListProperty which creates a heavy index which had me concerned about pricing as i have about 10 million entities that would need to be loaded into production.
This article that I found this morning seems pretty scary in terms of costs:
https://groups.google.com/forum/?fromgroups#!topic/google-appengine/-FqljlTruK4
So my question is - is geohashing a moot concept now that Google has released their full text search which has support for geo searching? It's not clear what's going on behind the scenes with this new API though and I'm concerned the index sizes might be just as big as if I used the GeoModel approach.
The other problem with the search API is that it appears I'd have to create not only my models in the datastore but then replicate some of that data (GeoPtProperty and entity_key for the model it represents at a minimum) into Documents which greatly increases my data set.
Any thoughts on this? At the moment I'm contemplating scraping this port as being too expensive although I've really enjoyed working in the App Engine environment so far and would love to get away from EC2 for some of my applications.
You're asking many questions here:
is geohashing a moot concept: Probably not, I suspect the Search API uses geohashing, or something similar for its location search.
can you use the Search API vs implementing it yourself: yes, but I don't know the cost one way or the other.
is geohashing expensive on app engine: in the message thread the cost is bad due to high index write costs. you'll have to engineer your geohashing data to minimize the indexing. If GeoModel puts a lot of indexed values in the list, you may be in trouble - I wouldn't use it directly without knowing how the indexing works. My guess is that if you reduce the location accuracy you can reduce the number of indexed entries, and that could save you a lot of cost.
As mentioned in the thread, you could have the geohashing run in CloudSQL.
I am using djangoappengine and I think have run into some problems with the way it handles eventual consistency on the high application datastore.
First, entity groups are not even implemented in djangoappengine.
Second, I think that when you do a djangoappengine get, the underlying app engine system is doing an app engine query, which are only eventually consistent. Therefore, you cannot even assume consistency using keys.
Assuming those two statements are true (and I think they are), how does one build an app of any complexity using djangoappengine on the high replication datastore? Every time you save a value and then try to get the same value, there is no guarantee that it will be the same.
Take a look in djangoappengine/db/compiler.py:get_matching_pk()
If you do a djangomodel.get() by the pk, it'll translate to a Google App Engine Get().
Otherwise it'll translate to a query. There's room for improvement here. Submit a fix?
Don't really know about djangoappengine but an appengine query if it includes only key is considered a key only query and you will always get consistent results.
No matter what the system you put on top of the AppEngine models, it's still true that when you save it to the datastore you get a key. When you look up an entity via its key in the HR datastore, you are guaranteed to get the most recent results.