How to access entities from many to many relationships - google-app-engine

Here is the explanation. I have two entities: House, Person. In my system one person may own multiple houses and one house may have multiple owners. So I create a table (aka entity) called HouseOwnership. I want to be able to make two different kinds of queries against HouseOwnership:
Given a houseId, get all it's owners
Given a personId, get all the houses owned
So for HouseOwnership, I do
#Entity
class HouseOwnership{
#Load
private Ref<House> houseRef;
#Load
private Ref<Person> personRef;
}
How do I make my queries with OfyService.ofy()?
I am tempted to do
owners =OfyService.ofy().load().type(HouseOwnership.class).ancestor(house).list()
and
houses =OfyService.ofy().load().type(HouseOwnership.class).ancestor(person).list()
but for this I would have to make both References into #Parent. So am I allowed to do that? How do I make the queries?
Also I only have the ids not the actual objects so I would have to create the objects from the ids, which I can do. But I am wondering if there is an easier way?

An entity can have only one parent.
You don't need to make your HouseOwnership entity a child of any entity.
You make a simple query to get all HouseOwnership entities where houseRef property equals a given House key, or personRef property equals a given Person key, or both.
You can always make a Key from an ID for entities that have no parents.

You almost certainly want to model this as a #Index Set<Ref<Person>> owners; property on House. Creating an extra relationship entity creates a significant amount of overhead.
Don't try to map schemas literally from relational models - use the document structure to your advantage.

Related

NDB query with projection on an attribute used in .IN()

Let's say I have a model:
class Pet(ndb.Model):
age = ndb.IntegerProperty(indexed=False)
name = ndb.StringProperty(indexed=True)
owner = ndb.KeyProperty(indexed=True)
And I have a list of keys named owners. To do a query for Pets I would do:
pets = Pets.query(Pets.owner.IN(owners)).fetch()
The problem is that this query returns the whole entity.
How can I do a projected query and get just the owner and the name?
Or how should I structure the data to just get the name and the owner.
I can do a projection for the name but I loose reference from the pet to the owner. And owner can't be in the projection.
As you have noticed, you can't do that with the exact context you mentioned, because you hit one of the Limitations on projections:
Properties referenced in an equality (=) or membership (IN) filter cannot be projected.
Since owner is used in a IN filter it can't be projected. Since you need the owner and you can't project it you'll have to drop the projection and thus you'll always get the entire entity.
One alternative would be to split your entity into 2 peer entities, always into a 1:1 relationship, using the same entity IDs:
class PetA(ndb.Model):
name = ndb.StringProperty(indexed=True)
owner = ndb.KeyProperty(indexed=True)
class PetB(ndb.Model):
age = ndb.IntegerProperty(indexed=False)
This way you can do the same query, except on PetA kind instead of the original Pet and the result you'd get would be the equivalent of the original projection query you were seeking.
Unfortunately this will only work with one or a very few such projection queries for the same entity, otherwise you'd need to split the entity in too many pieces. So you may have to compromise.
You can find more details about the entity splitting in re-using an entity's ID for other entities of different kinds - sane idea?

How would I achieve this using Google App Engine Datastore?

I am a beginner to Datastore and I am wondering how I should use it to achieve what I want to do.
For example, my app needs to keep track of customers and all their purchases.
Coming from relational database, I can achieve this by creating [Customers] and [Purchases] table.
In Datastore, I can make [Customers] and [Purchases] kinds.
Where I am struggling is the structure of the [Purchases] kind.
If I make [Purchases] as the child of [Customers] kind, would there be one entity in [Customers] and one entity in [Purchases] that share the same key? Does this mean inside of this [Purchases] entity, I would have a property that just keeps increasing for each purchase they make?
Or would I have one [Purchases] entity for each purchase they make and in each of these entities I would have a property that points to a entity in [Customers] kind?
How does Datastore perform in these scenarios?
Sounds like you don't fully understand ancestors. Let's go with the non-ancestor version first, which is a legitimate way to go:
class Customer(ndb.Model):
# customer data fields
name = ndb.StringProperty()
class Purchase(ndb.Model):
customer = ndb.KeyProperty(kind=Customer)
# purchase data fields
price = ndb.IntegerProperty
This is the basic way to go. You'll have one entity in the datastore for each customer. You'll have one entity in the datastore for each purchase, with a keyproperty that points to the customer.
IF you have a purchase, and need to find the associated customer, it's right there.
purchase_entity.customer.get()
If you have a Customer, you can issue a query to find all the purchases that belong to the customer:
Purchase.query(customer=customer_entity.key).fetch()
In this case, whenever you write either a customer or purchase entity, the GAE datastore will write that entity any one of the datastore machines running in the cloud that's not busy. You can have really high write throughput this way. However, when you query for all the purchases of a given customer, you just read back the most current data in the indexes. If a new purchase was added, but the indexes not updated yet, then you may get stale data (eventual consistency). You're stuck with this behavior unless you use ancestors.
Now as for the ancestor version. The basic concept is essentially the same. You still have a customer entity, and separate entities for each purchase. The purchase is NOT part of the customer entity. However, when you create a purchase using a customer as an ancestor, it (roughly) means that the purchase is stored on the same machine in the datastore that the customer entity was stored on. In this case, your write performance is limited to the performance of that one machine, and is advertised as one write per second. As a benefit though, you can can query that machine using an ancestor query and get an up-to-date list of all the purchases of a given customer.
The syntax for using ancestors is a bit different. The customer part is the same. However, when you create purchases, you'd create it as:
purchase1 = Purchase(ancestor=customer_entity.key)
purchase2 = Purchase(ancestor=customer_entity.key)
This example creates two separate purchase entities. Each purchase will have a different key, and the customer has its own key as well. However, each purchase key will have the customer_entity's key embedded in it. So you can think of the purchase key being twice as long. However, you don't need to keep a separate KeyProperty() for the customer anymore, since you can find it in the purchases key.
class Purchase(ndb.Model):
# you don't need a KeyProperty for the customer anymore
# purchase data fields
price = ndb.IntegerProperty
purchase.key.parent().get()
And in order to query for all the purchases of a given customer:
Purchase.query(ancestor=customer_entity.key).fetch()
The actual of structure of the entities don't change much, mostly the syntax. But the ancestor queries are fully consistent.
The third option that you kinda describe is not recommended. I'm just including it for completeness. It's a bit confusing, and would go something like this:
class Purchase(ndb.Model):
# purchase data fields
price = ndb.IntegerProperty()
class Customer(ndb.Model):
purchases = ndb.StructuredProperty(Purchase, repeated=True)
This is a special case which uses ndb.StructuredProperty. In this case, you will only have a single Customer entity in the datastore. While there's a class for purchases, your purchases won't get stored as separate entities - they'll just be stored as data within the Customer entity.
There may be a couple of reasons to do this. You're only dealing with one entity, so your data fetch will be fully-consistent. You also have reduced write costs when you have to update a bunch of purchases, since you're only writing a single entity. And you can still query on the properties of the Purchase class. However, this was designed for only having a limited number or repeated objects, not hundreds or thousands. And each entity is limited to ta total size of 1MB, so you'll eventually hit that and you won't be able to add more purchases.
(from your personal tags I assume you are a java guy, using GAE+java)
First, don't use the ancestor relationships - this has a special purpose to define the transaction scope (aka Entity Groups). It comes with several limitations and should not be used for normal relationships between entities.
Second, do use an ORM instead of low-level API: my personal favourite is objectify. GAE also offers JDO or JPA.
In GAE relations between entities are simply created by storing a reference (a Key) to an entity inside another entity.
In your case there are two possibilities to create one-to-many relationship between Customer and it's Purchases.
public class Customer {
#Id
public Long customerId; // 'Long' identifiers are autogenerated
// first option: parent-to-children references
public List<Key<Purchase>> purchases; // one-to-many parent-to-child
}
public class Purchase {
#Id
public Long purchaseId;
// option two: child-to-parent reference
public Key<Customer> customer;
}
Whether you use option 1 or option 2 (or both) depends on how you plane to access the data. The difference is whether you use get or query. The difference between two is in cost and speed, get being always faster and cheaper.
Note: references in GAE Datastore are manual, there is no referential integrity: deleting one part of a relationship will produce no warning/error from Datastore. When you remove entities it's up to your code to fix references - use transactions to update two entities consistently (hint: no need to use Entity Groups - to update two entities in a transaction you can use XG transactions, enabled by default in objectify).
I think the best approach in this specific case would be to use a parent structure.
class Customer(ndb.Model):
pass
class Purchase(ndb.Model):
pass
customer = Customer()
customer_key = customer.put()
purchase = Purchase(parent=customer_key)
You could then get all purchases of a customer using
purchases = Purchase.query(ancestor=customer_key)
or get the customer who bough the purchase using
customer = purchase.key.parent().get()
It might be a good idea to keep track of the purchase count indeed when you use that value a lot.
You could do that using a _pre_put_hook or _post_put_hook
class Customer(ndb.Model):
count = ndb.IntegerProperty()
class Purchase(ndb.Model):
def _post_put_hook(self):
# TODO check whether this is a new entity.
customer = self.key.parent().get()
customer.count += 1
customer.put()
It would also be good practice to do this action in a transacion, so the count is reset when putting the purchase fails and the other way around.
#ndb.transactional
def save_purchase(purchase):
purchase.put()

Database design to support dynamic entities

OK, I don't know whether this question belong to this place, but you will suggest me if I'm wrong.
I have some entities which has almost same attributes, differences is in maybe 2-3 columns.
Because of those different columns, I can't create one table with columns that are union of attributes of every entity, because new entity type will require changing table design adding new columns specific to that entity type.
Instead, currently working design is that every specific entity has own table.
But, if new type of entity come on scene, I must create new table, which is totally bad idea.
How can I create one table which consists shared attributes for each type of entity, and some additional mechanism to evidence entity-unique attributes?
So, idea is to easy add new types of objects, without changing database design, configuring only part that deal with unique columns.
P.S. Maybe I'm not clear, but I will add more description if is it needed.
I had a design like that once. What I did was I created a table that housed all the shared properties. Then, I had separate tables for the distinct values. I used joins to match a specific entity to its shared table row. I had less than 10, so my views that used unions I just updated when I added a new entity. But, if you used a naming convention, you could write stored procs that find the table names dynamically and do the unions and joins on the fly. In my case, I used a base class and specific classes to make a custom data layer.
Another possibility is to have a generic table that's basically name/value pairs and a table the represents your shared properties. By joining the tables together, you could have any number of entity specific properties for your entities. It's not very efficient and the SQL would get weird, but I've seen it done.
One solution is to store the common parts in one table, and the specific parts in tables specific to that entity.
eg: To have a set of people, some of whom are managers...
Person Table
PersonID
PersonName
Manager Table
ManagerID
PersonID
DepartmentManaged
As soon as you go down the path of having one table with variable field meanings - effectively an Entity Attribute Value design - you find yourself in querying hell.
Perhaps not the best or most academic, but what about this kind of "open structure" ?
MainTable: all common fields
SpecialProperties: extra properties, as required
- MainRecordId (P, F->MainTable)
- PropertyName (P)
- PropertyText
- PropertyValue (for numeric values)

NDB Modeling One-to-one with KeyProperty

I'm quite new to ndb but I've already understood that I need to rewire a certain area in my brain to create models. I'm trying to create a simple model - just for the sake of understanding how to design an ndb database - with a one-to-one relationship: for instance, a user and his info. After searching around a lot - found documentation but it was hard to find different examples - and experimenting a bit (modeling and querying in a couple of different ways), this is the solution I found:
from google.appengine.ext import ndb
class Monster(ndb.Model):
name = ndb.StringProperty()
#classmethod
def get_by_name(cls, name):
return cls.query(cls.name == name).get()
def get_info(self):
return Info.query(Info.monster == self.key).get()
class Info(ndb.Model):
monster = ndb.KeyProperty(kind='Monster')
address = ndb.StringProperty()
a = Monster(name = "Dracula")
a.put()
b = Info(monster = a.key, address = "Transilvania")
b.put()
print Monster.get_by_name("Dracula").get_info().address
NDB doesn't accept joins, so the "join" we want has to be emulated using class methods and properties. With the above system I can easily reach a property in the second database (Info) through a unique property in the first (in this case "name" - suppose there are no two monsters with the same name).
However, if I want to print a list with 100 monster names and respective addresses, the second database (Info) will be hit 100 times.
Question: is there a better way to model this to increase performance?
If its truly a one to one relationship, why are creating 2 models. Given your example the Address entity cannot be shared with any Monster so why not put the Address details in the monster.
There are some reasons why you wouldn't.
Address could become large and therefore less efficient to retrieve 100's of properties when you only need a couple - though project queries may help there.
You change your mind and you want to see all monsters that live in Transylvania - in which case you would create the address entity and the Monster would have the key property that points to the Address. This obviously fails when you work out that some monsters can live in multiple places (Werewolfs - London, Transylvania, New York ;-) , in which case you either have a repeating KeyProperty in the monstor or an intermediate entity that points to the monster and the address. In your case I don't think that monsters on the whole have that many documented Addresses ;-)
Also if you are uniquely identifying monsters by name you should consider storing the name as part of the key. Doing a Monster.get_by_id("dracula") is quicker than a query by name.
As I wrote (poorly) in the comment. If 1. above holds and it is a true one to one relationship. I would then create Address as a child entity (Monster is the parent/ancestor in the key) when creating address. This allows you to,
allow other entities to point to the Address,
If you create a bunch of child entities, fetch them with a single
ancestor query). 3 If you have get monster and it's owned entities
again it's an ancestor query.
If you have a bunch of entities that
should only exist if Monster instance exists and they are not
children, then you have to do querys on all the entity types with
KeyProperty's matching the key, and if theses entities are not
PolyModels, then you have to perform a query for each entity
type (and know you need to perform the query on a given entity,
which involves a registry of some type, or hard coding things)
I suspect what you may be trying could be achieved by using elements described in the link below
Have a look at "Operations on Multiple Keys or Entities" "Expando Models" "Model Hooks"
https://developers.google.com/appengine/docs/python/ndb/entities
(This is probably more a comment than an answer)

Objectify and entity groups

Here is a question for all the objectify/ app engine gurus out there:
I am creating ojectes with a parent/child relationship by storing the key of the parent object in the child.
This is stored in a object of type Key. For example let's say I have a car object and tire objects.
The tire objects store the parent key in a variable of type Key.
#Entity
Public class Tire{
#Id Long id;
Key<Car> ParentKey;
int size;
}
In my app I will need to get all the tires given a certain car. I can do this with a query:
Tire tires = oft.query(Tire.class).filter("ParentKey",carKey).get();
Is this an approipriate way to accomplish this? Will this cause any issues with entity groups? Will this be efficient for a large number of cars and tires?
Right now you're not creating a parent/child relationship, at least as is defined by app engine. Check out the documentation: adding a parent/child relationship can speed up things because the car and its tyres will be stored physically together, but they can be difficult to remove if at some point they are not longer needed.
To create a parent/child relationship using Objectify, add the #Parent annotation:
// Use com.googlecode.objectify.Key instead of
// com.google.appengine.api.datastore.Key
#Parent Key<Car> parentKey;
Now, in order to get all the tires that belong to a specific car:
List<Tyre> tires = ofy().query(Tyre.class).ancestor(carKey).list();
I'm using exactly same way - no problem.
I don't see there any conflicts with entity groups, and it's working fine for a large groups (at least for a thousands of entities)
P.S. If you need to fetch data that belongs to same group - you don't need to use GAE groups. Even more: entity groups are best for transactions, not for filtering.

Resources