I have a couple of doubts regarding how NDB projection queries work and how the caching behaves behind the scenes
So given a model similar to:
class Users(ndb.Model):
user_name = ndb.StringProperty(required=True)
user_email = ndb.StringProperty(required=True)
user_password = ndb.StringProperty(required=True)
#classmethod # THIS ONE DOES NOT WORK
def get_profile_info(cls, id):
return ndb.Key(Users, id).get(projection=[Users.user_name])
#classmethod # THIS ONE WORKS
def get_profile_info(cls, id):
return Users.query(Users.key == ndb.Key(Users, id)).get(projection=[Users.user_name])
Why does the first classmethod raise a "TypeError: Unknown configuration option ('projection')"? Can't I simply call a projection on a direct get of a key, instead of having to query for a key?
Secondly, regarding caching, I'm not sure if I correctly understood this thread: NDB Caching When Using Projected Queries
Aren't projected queries cached? Does this mean its better to simply call a get() (and fetch the whole instance) so it is cached, instead of projecting?
Thanks in advance!
As per the error a projection makes no sense when using get. From the docs " It only gets values for those properties in the projection. It gets this data from the query index (and thus, properties in the projection must be indexed)". So doing get isn't accessing the object properties via indexes. Note gvr's comment on caching in your referenced question.
Related
I have a simple little "Observation" class:
from google.appengine.ext import ndb
class Observation(ndb.Model):
remote_id = ndb.StringProperty()
dimension_id = ndb.IntegerProperty()
metric = ndb.StringProperty()
timestamp_observed = ndb.StringProperty()
timestamp_received = ndb.DateTimeProperty(auto_now_add=True)
#classmethod
def query_book(cls):
return cls.query()
I can run projection queries against the Datastore to return only certain columns. E.g:
observations = Observation.query().fetch(projection=[Observation.dimension_id])
This works nicely, but I only want unique results. The documentation makes this sound easy:
# Functionally equivalent
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
But when I do this:
observations = Observation.query().fetch(projection=[Observation.dimension_id], group_by=[Observation.dimension_id])
observations = Observation.query().fetch(projection=[Observation.dimension_id], distinct=True)
I get errors for both variants.
TypeError: Unknown configuration option ('group_by')
TypeError: Unknown configuration option ('distinct')
This happens on localhost and in the prod environment too. What am I missing?
Silly me - all of these params need to sit within the query() function, not within fetch(). The projection elements actually works in fetch(), but you need to move both the projection and distinct arguments into query() to get it to work.
From Grouping:
Projection queries can use the distinct keyword to ensure that only
completely unique results will be returned in a result set. This will
only return the first result for entities which have the same values
for the properties that are being projected.
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
Both queries are equivalent and will produce each author's name only
once.
Hope this helps anyone else with a similar problem :)
Let's say i have this really simple parent/child relatiosnship (any Answer class instances always has a Question parent):
class Answer(ndb.Model):
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
class Question(ndb.Model):
content = ndb.StringProperty()
answers = ndb.KeyProperty(repeated = True, kind = 'Answer')
def to_message(self):
"""Returns a protoRPC message object of the question"""
The two to message methods are simply used to return a protoRPC object.
The question is: in my to_message method, in the Question class, if i want to fetch all child Answer instances, retrieve them, and use their own to_message method to make them into a nice rpc Message, is it better to:
Iterate over the anwers repeated KeyProperty list
Do a query using a filter on the "parent" property, and iterate over the list it outputs
In terms of NDB access, the first method seems to be the best, but since we're going to go over the free limit anyway, i'm more wondering if the datastore is not more efficient at fetching stuff than i am, iterating over that list.
Edit: The original question has actually a very simple and obvious answer: the first way.
The real question would be, in case I have to filter out some Answer entities based on their attributes (for instance timestamp): is it better to query using a filter, or iterate over the list and use a condition to gather only the "interesting" entities?
With that schema you don't have to query anything because you already have the keys of each answer as a list of keys in question_entity.answers
So you only have to fetch the answers using that keys. Is better if you get all the answers in only one operation.
list_of_answers = ndb.get_multi(question_entity.answers)
(More info at NDB Entities and Keys)
On the other hand, if you model that relationship with a KeyProperty in Answer:
class Answer(ndb.Model):
question = ndb.KeyProperty(Question)
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
or with ancestors:
answer = Answer(parent=question_entity.key)
In these cases you should use a normal query for retrieve the answers:
answers = Answer.query(Answer.question == question_entity.key)
or an ancestor query:
answers = Answer.query(ancestor = question_entity.key)
respectively.
This means two jobs: Query the index plus fetching the datastore. In conclusion, in this case the first approach is cheaper for retrieving datastore data.
Using ndb.get_multi on the list of keys to fetch the Answers, and then iterating to call their to_message methods will be the most efficient.
Do I pay a penalty on query performance if I choose to query repeated property? For example:
class User(ndb.Model):
user_name = ndb.StringProperty()
login_providers = ndb.KeyProperty(repeated=true)
fbkey = ndb.Key("ProviderId", 1, "ProviderName", "FB")
for entry in User.query(User.login_providers == fbkey):
# Do something with entry.key
vs
class User(ndb.Model)
user_name = ndb.StringProperty()
class UserProvider(ndb.Model):
user_key = ndb.KeyProperty(kind=User)
login_provider = ndb.KeyProperty()
for entry in UserProvider.query(
UserProvider.user_key == auserkey,
UserProvider.login_provider == fbkey
):
# Do something with entry.user_key
Based on the documentation from GAE, it seems that Datastore take care of indexing and the first less verbose option would be using the index. However, I failed to find any documentation to confirm this.
Edit
The sole purpose of UserProvider in the second example is to create a one-to-many relationship between a user and it's login_provider. I wanted to understand if it worth the trouble of creating a second entity instead of querying on repeated property. Also, assume that all I need is the key from the User.
No. But you'll raise your write costs because each entry needs to be indexed, and write costs are based on the number of indexes updated.
I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item
Are the children of an entity available in a Query?
Given:
class Factory(db.Model):
""" Parent-kind """
name = db.StringProperty()
class Product(db.Model):
""" Child kind, use Product(parent=factory) to make """
#property
def factory(self):
return self.parent()
serial = db.IntegerProperty()
Assume 500 factories have made 500 products for a total of 250,000 products. Is there a way to form a resource-efficient query that will return just the 500 products made by one particular factory? The ancestor method is a filter, so using e.g. Product.all().ancestor(factory_1) would require repeated calls to the datastore.
Although ancestor is described as a "filter", it actually just updates the query to add the ancestor condition. You don't send a request to the datastore until you iterate over the query, so what you have will work fine.
One minor point though: 500 entities with the same parent can hurt scalability, since writes are serialized to members of an entity group. If you just want to track the factory that made a product, use a ReferenceProperty:
class Product(db.Model):
factory = db.ReferenceProperty(Factory, collection_name="products")
You can then get all the products by using:
myFactory.products