Is there a way how to query all GAE Datastore entities that have a parent of a given kind? Each entity has a key that consists of a kind and id/name and we would like to query by that kind. Is this somehow possible to use that information in a query? Or do we have to store the kind in a separate property and then use that property in the query?
That's an interesting question. If you mean, given an Entity of kind A, where A's parent can be of kind B, C, ..., find all of the A's that have a parent of kind B, then I'm pretty sure that the answer is that this isn't doable in a single query, other than iterating across all As, examining their parent's kind. (If I discover otherwise, I'll revise this answer).
Given this problem, I'd store the parent kind as a separate (string) property.
I have some problems regarding entity framework and saving to the db.
As my current program works, it deserializes a json-object which results in a list with objects that matches the database. Each of these objectst looks like this:
Every object is a parent with a relation to one child-object.
Each child object have a relation to one or many parent-objects.
After the deserialization is complete each child-objects is created as new object for every parent (meaning I get several instances of the same object).
When i try to save the objects to the db, this offcourse doesn't work since I'm trying to insert many child-objects whith the same pk. I can clearly see that context.childentity.local contains many objects with the same pk.
Is there any easy way to solve this issue? Can I in some way tell EF to refer all duplicates to the same object?
Best regards Anton
I have two models which naturally exist in a parent-child relationship. IDs for the child are unique within the context of a single parent, but not necessarily globally, and whenever I want to query a specific child, I'll have the IDs for both parent and child available.
I can implement this two ways.
Make the datastore key name of each child entity be the string "<parent_id>,<child_id>", and do joins and splits to process the IDs.
Use parent keys.
Option 2 sounds like the obvious winner from a code perspective, but will it hurt performance on writes? If I never use transactions, is there still overhead for concurrent writes to different children of the same parent? Is the datastore smart enough to know that if I do two transactions in the same entity group which can't affect each other, they should both still apply? Or should parent keys be avoided if locking isn't necessary?
In terms of the datastore itself, parent/child relationships are conceptual only. That is, the actual entities are not joined in any way.
A key consists of a Parent Key, a Kind and Id. This is the only link between them.
Therefore, there isn't any real impact beyond the ability to do things transactionally. Similarly, siblings have no actual relationship, just a conceptual one.
For example, you can put an entity into the datastore referencing a parent which doesn't actually exist. That is entirely legitimate and oftentimes very useful.
So, the only difference between option 1 and option 2 is that with option 1 you have to do more heavy lifting and cannot take advantage of transactions or strongly consistent queries.
Edit: The points above to do not mention the limitation of 1 write per entity group per second. So to directly answer the original question, using parent keys limits throughput if you want to write to many entities sharing the same parent key within a second outside of a single transaction.
In general, if you don't need two entities to be updated or read in the same transaction, they should not be in the same entity group, i.e. have similar roots in their key paths, as they would if one were a key-parent of the other. If they're in the same entity group, then concurrent updates to either entity will contend for the entire group, and some updates may need to be retried.
From your question, it sounds like "<parent_id>,<child_id>" is an appropriate key name for the child. If you need to access these IDs separately (such as to get all entities with a particular "<child_id>"), you can store them as indexed properties, and perform queries as needed.
For the transactions, you cannot do multiple concurrent writes
I have a list of keys, which should be children of an entity of MyModel (but some might not be), and I want to get the entities referred to by those keys in a transaction. One way of doing this is:
ifilter(None, ModelX.all().ancestor(Y).filter('__key__', xk).get() for xk in xkeys)
But it seems inefficient to run a separate query for each key. Is there a way to run IPN.get() on a list of keys in a transaction, preserving order, but ignoring those which don't belong to an entity group, instead of throwing a BadRequestError?
Assuming xkeys is a list of keys
results = [db.get(xk) for xk in xkeys]
But then, I think you can just do:
results = db.get(xkeys)
Oh I just read that you said (but some might not be). This is the problem. All the entities involved in the transaction must be in the same ancestor group. So to make this work, you've got to remove the entities that don't have the same ancestor from the list.
There didn't seem to be a way of doing what I asked, so I returned the keys from the transaction and did an db.get(ks) outside of it.
Currently, a lot of my code makes extensive use of ancestors to put and fetch objects. However, I'm looking to change some stuff around.
I initially thought that ancestors helped make querying faster if you knew who the ancestor of the entity you're looking for was. But I think it turns out that ancestors are mostly useful for transaction support. I don't make use of transactions, so I'm wondering if ancestors are more of a burden on the system here than a help.
What I have is a User entity, and a lot of other entities such as say Comments, Tags, Friends. A User can create many Comments, Tags, and Friends, and so whenever a user does so, I set the ancestor for all these newly created objects as the User.
So when I create a Comment, I set the ancestor as the user:
comment = Comment(aUser, key_name = commentId)
Now the only reason I'm doing this is strictly for querying purposes. I thought it would be faster when I wanted to get all comments by a certain user to just get all comments with a common ancestor rather than querying for all comments where authorEmail = userEmail.
So when I want to get all comments by a certain user, I do:
commentQuery = db.GqlQuery('SELECT * FROM Comment WHERE ANCESTOR IS :1', userKey)
So my question is, is this a good use of ancestors? Should each Comment instead have a ReferenceProperty that references the User object that created the comment, and filter by that?
(Also, my thinking was that using ancestors instead of an indexed ReferenceProperty would save on write costs. Am I mistaken here?)
You are right about the writing cost, an ancestor is part of the key which comes "free". using a reference property will increase your writing cost if the reference property is indexed.
Since you query on that reference property if will need to be indexed.
Ancestor is not only important for transactions, in the HRD (the default datastore implementation) if you don't create each comment with the same ancestor, the quires will not be strongly consistent.
-- Adding Nick's comment ---
Every entity with the same parent will be in the same entity group, and writes to entity groups are serialized, so using ancestors here will slow things down iff you're writing multiple entities concurrently. Since all the entities in a group are 'owned' by the user that forms the root of the group in your instance, though, this shouldn't be a problem - and in fact, what you're doing is actually a recommended design pattern.
I'm developing an application with Google App Engine and stumbled across the following scenario, which can perhaps be described as "MVP-lite".
When modeling many-to-many relationships, the standard property to use is the ListProperty. Most likely, your list is comprised of the foreign keys of another model.
However, in most practical applications, you'll usually want at least one more detail when you get a list of keys - the object's name - so you can construct a nice hyperlink to that object. This requires looping through your list of keys and grabbing each object to use its "name" property.
Is this the best approach? Because "reads are cheap", is it okay to get each object even if I'm only using one property for now? Or should I use a special property like tipfy's JsonProperty to save a (key, name) "tuple" to avoid the extra gets?
Though datastore reads are comparatively cheaper datastore writes, they can still add significant time to request handler. Including the object's names as well as their foreign keys sounds like a good use of denormalization (e.g., use two list properties to simulate a tuple - one contains the foreign keys and the other contains the corresponding name).
If you decide against this denormalization, then I suggest you batch fetch the entities which the foreign keys refer to (rather than getting them one by one) so that you can at least minimize the number of round trips you make to the datastore.
When modeling one-to-many (or in some
cases, many-to-many) relationships,
the standard property to use is the
No, when modeling one-to-many relationships, the standard property to use is a ReferenceProperty, on the 'many' side. Then, you can use a query to retrieve all matching entities.
Returning to your original question: If you need more data, denormalize. Store a list of titles alongside the list of keys.