If I have a class that has a back reference (eg something_set), how do I query for keys only on that set? The Query() constructor allows you to do this by settings keys_only=True, but as far as I can tell, filtering directly on the back reference always de-references the entities when it returns them.
You can't - keys_only needs to be set when the Query is constructed, and that's already done for you when you access something_set.
That said, foo.bar_set is just Syntactic sugar for:
q = Foo.all().filter('bar =', foo_instance)
So you can do that and use the keys_only operator in the all() method.
Related
db.ReferenceProperty allowed for a collection_name argument that specified how the reference object R could query for the objects that contain R as a reference.
I don't see a similar argument for ndb.KeyProperty. How do folks get around that?
It took a little getting used to when I made the switch to ndb, but it is actually simpler. The collection_name is just syntactic sugar for a query, and you can do the query yourself:
MyEntity(ndb.Model):
a_key = ndb.KeyProperty()
entities = MyEntity.query(MyEntity.a_key == some_key)
Let's say i have this really simple parent/child relatiosnship (any Answer class instances always has a Question parent):
class Answer(ndb.Model):
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
class Question(ndb.Model):
content = ndb.StringProperty()
answers = ndb.KeyProperty(repeated = True, kind = 'Answer')
def to_message(self):
"""Returns a protoRPC message object of the question"""
The two to message methods are simply used to return a protoRPC object.
The question is: in my to_message method, in the Question class, if i want to fetch all child Answer instances, retrieve them, and use their own to_message method to make them into a nice rpc Message, is it better to:
Iterate over the anwers repeated KeyProperty list
Do a query using a filter on the "parent" property, and iterate over the list it outputs
In terms of NDB access, the first method seems to be the best, but since we're going to go over the free limit anyway, i'm more wondering if the datastore is not more efficient at fetching stuff than i am, iterating over that list.
Edit: The original question has actually a very simple and obvious answer: the first way.
The real question would be, in case I have to filter out some Answer entities based on their attributes (for instance timestamp): is it better to query using a filter, or iterate over the list and use a condition to gather only the "interesting" entities?
With that schema you don't have to query anything because you already have the keys of each answer as a list of keys in question_entity.answers
So you only have to fetch the answers using that keys. Is better if you get all the answers in only one operation.
list_of_answers = ndb.get_multi(question_entity.answers)
(More info at NDB Entities and Keys)
On the other hand, if you model that relationship with a KeyProperty in Answer:
class Answer(ndb.Model):
question = ndb.KeyProperty(Question)
content = ndb.StringProperty()
timestamp = ndb.DateTimeProperty()
def to_message():
"""Returns a protoRPC message object of the answer"""
or with ancestors:
answer = Answer(parent=question_entity.key)
In these cases you should use a normal query for retrieve the answers:
answers = Answer.query(Answer.question == question_entity.key)
or an ancestor query:
answers = Answer.query(ancestor = question_entity.key)
respectively.
This means two jobs: Query the index plus fetching the datastore. In conclusion, in this case the first approach is cheaper for retrieving datastore data.
Using ndb.get_multi on the list of keys to fetch the Answers, and then iterating to call their to_message methods will be the most efficient.
I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item
I have a couple of doubts regarding how NDB projection queries work and how the caching behaves behind the scenes
So given a model similar to:
class Users(ndb.Model):
user_name = ndb.StringProperty(required=True)
user_email = ndb.StringProperty(required=True)
user_password = ndb.StringProperty(required=True)
#classmethod # THIS ONE DOES NOT WORK
def get_profile_info(cls, id):
return ndb.Key(Users, id).get(projection=[Users.user_name])
#classmethod # THIS ONE WORKS
def get_profile_info(cls, id):
return Users.query(Users.key == ndb.Key(Users, id)).get(projection=[Users.user_name])
Why does the first classmethod raise a "TypeError: Unknown configuration option ('projection')"? Can't I simply call a projection on a direct get of a key, instead of having to query for a key?
Secondly, regarding caching, I'm not sure if I correctly understood this thread: NDB Caching When Using Projected Queries
Aren't projected queries cached? Does this mean its better to simply call a get() (and fetch the whole instance) so it is cached, instead of projecting?
Thanks in advance!
As per the error a projection makes no sense when using get. From the docs " It only gets values for those properties in the projection. It gets this data from the query index (and thus, properties in the projection must be indexed)". So doing get isn't accessing the object properties via indexes. Note gvr's comment on caching in your referenced question.
I'm having problems converting a set problem into an efficient google app engine datastore solution. The problem is as follows. I have an entity defining a relationship between two objects, i.e. something like this:
struct Relation
{
Obj1 int
Obj2 int
// other data
}
Now I want to perform the following query in an efficient manner: given a set of objects set = [obj1, obj2, obj3, obj4], I want to find all Relation entities (E) for which E.Obj1 ∈ set ∧ E.Obj2 ∈ set. Note that I do not know the set beforehand, so I cannot precompute all the entries in the set once. Is there any way to represent this problem in the datastore so that I can efficiently retrieve all the relationships that are part of a given set?
The equivalent GQL query is "SELECT * FROM Kind WHERE Obj1 IN :1 AND Obj2 IN :1", passing in the set as the first parameter. Unfortunately, IN queries expand out to one query for each term, so there's a combinatorial explosion of queries here - 16 queries in the case of a 4 element set. There's not really any way to avoid this with a standard query.