How to implement on to many relation ? datastore google app engine - google-app-engine

my application have business entity, and each business belong to one or more category !
How should implement the relation in my database ?
I have two options,
first option :
(to store all the categories that belong to specific business, at the business entity.)
class business(ndb.Model):
name = ndb.StringProperty()
categories = ndb.KeyProperty(kind=category,repeated=True)
class category(ndb.Model)
name = ndb.StringProperty()
the second option :
(to store all the business that belong to a specific category at the category entity)
class business(ndb.Model):
name = ndb.StringProperty()
class category(ndb.Model)
name = ndb.StringProperty()
businesses = ndb.KeyProperty(kind=business,repeated=True)
which option should I implement ?
Another problem:
every business could have one or more image:
should I store the images in list inside the business entity :
class business(ndb.Model):
name = ndb.StringProperty()
imagesUrl = ndb.StringProperty(repeated = True)
or create new entity for each image :
class image(ndb.Model):
businessKey = ndb.KeyProperty(repeated = True)
imageUrl = ndb.StringProperty()
I know that the entity size is limited to one mega! yes ?

Decisions like this one usually depend on your usage patterns. In you case, however, it looks like option 1 is the logical choice, because there are many more business entities than category entities.
For example, if you need to know which categories a business belongs to when you retrieve a single business entity, you will have to run an extra query if you choose option 2. With option 1, this extra query is unnecessary.
Another consideration is the frequency of updates. If you go with option 2, you will have to update your category entity every time you add a new business entity (thus, two entities have to be updated, which impacts performance and costs). With option 1 you only need to update one entity.

Related

GAE: NDB query on list

Google App Engine NDB data model like so:
Users
Username
FirstName
LastName
Posts
PostID
PosterUsername
SubscribedPosts
PostID
SubscriberUsername
For a specific user, I want to return all the Posts which the user is subscribed to and display them on the page.
Since the wonderful NDB doesn't support JOINs, we do two queries:
postIDList =
SubscribedPosts.query(SubscribedPosts.SubscriberUsername == 'johndoe').fetch()
This gives us a list of SubscribedPosts. So how do I take my postIDList list and use it as a filter criteria for a Posts query?
Something like:
results = Posts.query(Post.PostID IN postIDList.PostID)
In a normal relational database, this would be a simple query using table joins. How is this done in Google's ndb?
You are going to run into lots of bottlenecks if you try to design your datastore models the same way you would design tables in a relational database as you have in this example.
Your comment goes in one possible right direction, although there are other solutions. Going that route, I would drop the "subscribedPosts" model altogether use a repeated KeyProperty entity in the User model to store subscribed posts.
See this related post: One-To-Many Example in NDB
Seems you are looking to model a many-to-many relationship, not one-to-many. Read Modelling Entity Relationships (althought this is for the older db, not newer ndb, it still gives the idea).
One of the two entities should maintain a list of keys (repeated=True) of the related other entities. Which entity should hold the list? Preferably the list should be on the side that usually has fewer relationships so that the list of keys is smaller. Another consideration is which side likely has less contention for updates.
In your specific case, lets say on average users subscribe to 10 posts and lets say on average each post has 100 users subscribed to it. In this case, we would want to put the list of keys on Users side of the relation.
class Users(ndb.Model):
user_name = ndb.StringProperty()
first_name = ndb.StringProperty()
last_name = ndb.StringProperty()
posts = ndb.KeyProperty(kind='Posts', repeated=True)
class Posts(ndb.Model)
post_id = ndb.StringProperty()
poster_user_name = ndb.StringProperty()
Establish the relationship by adding to the list in the Users instance:
current_user.posts.append(current_post.key)
For a given Users instance, getting all subscribed Posts is easy since the list of keys of the subscribed Posts is already within the given Users:
ndb.get_multi(given_user.posts)
For a given Posts instance, get all subscribing Users by ...
query = Users.query(Users.posts == given_post.key)

How would I achieve this using Google App Engine Datastore?

I am a beginner to Datastore and I am wondering how I should use it to achieve what I want to do.
For example, my app needs to keep track of customers and all their purchases.
Coming from relational database, I can achieve this by creating [Customers] and [Purchases] table.
In Datastore, I can make [Customers] and [Purchases] kinds.
Where I am struggling is the structure of the [Purchases] kind.
If I make [Purchases] as the child of [Customers] kind, would there be one entity in [Customers] and one entity in [Purchases] that share the same key? Does this mean inside of this [Purchases] entity, I would have a property that just keeps increasing for each purchase they make?
Or would I have one [Purchases] entity for each purchase they make and in each of these entities I would have a property that points to a entity in [Customers] kind?
How does Datastore perform in these scenarios?
Sounds like you don't fully understand ancestors. Let's go with the non-ancestor version first, which is a legitimate way to go:
class Customer(ndb.Model):
# customer data fields
name = ndb.StringProperty()
class Purchase(ndb.Model):
customer = ndb.KeyProperty(kind=Customer)
# purchase data fields
price = ndb.IntegerProperty
This is the basic way to go. You'll have one entity in the datastore for each customer. You'll have one entity in the datastore for each purchase, with a keyproperty that points to the customer.
IF you have a purchase, and need to find the associated customer, it's right there.
purchase_entity.customer.get()
If you have a Customer, you can issue a query to find all the purchases that belong to the customer:
Purchase.query(customer=customer_entity.key).fetch()
In this case, whenever you write either a customer or purchase entity, the GAE datastore will write that entity any one of the datastore machines running in the cloud that's not busy. You can have really high write throughput this way. However, when you query for all the purchases of a given customer, you just read back the most current data in the indexes. If a new purchase was added, but the indexes not updated yet, then you may get stale data (eventual consistency). You're stuck with this behavior unless you use ancestors.
Now as for the ancestor version. The basic concept is essentially the same. You still have a customer entity, and separate entities for each purchase. The purchase is NOT part of the customer entity. However, when you create a purchase using a customer as an ancestor, it (roughly) means that the purchase is stored on the same machine in the datastore that the customer entity was stored on. In this case, your write performance is limited to the performance of that one machine, and is advertised as one write per second. As a benefit though, you can can query that machine using an ancestor query and get an up-to-date list of all the purchases of a given customer.
The syntax for using ancestors is a bit different. The customer part is the same. However, when you create purchases, you'd create it as:
purchase1 = Purchase(ancestor=customer_entity.key)
purchase2 = Purchase(ancestor=customer_entity.key)
This example creates two separate purchase entities. Each purchase will have a different key, and the customer has its own key as well. However, each purchase key will have the customer_entity's key embedded in it. So you can think of the purchase key being twice as long. However, you don't need to keep a separate KeyProperty() for the customer anymore, since you can find it in the purchases key.
class Purchase(ndb.Model):
# you don't need a KeyProperty for the customer anymore
# purchase data fields
price = ndb.IntegerProperty
purchase.key.parent().get()
And in order to query for all the purchases of a given customer:
Purchase.query(ancestor=customer_entity.key).fetch()
The actual of structure of the entities don't change much, mostly the syntax. But the ancestor queries are fully consistent.
The third option that you kinda describe is not recommended. I'm just including it for completeness. It's a bit confusing, and would go something like this:
class Purchase(ndb.Model):
# purchase data fields
price = ndb.IntegerProperty()
class Customer(ndb.Model):
purchases = ndb.StructuredProperty(Purchase, repeated=True)
This is a special case which uses ndb.StructuredProperty. In this case, you will only have a single Customer entity in the datastore. While there's a class for purchases, your purchases won't get stored as separate entities - they'll just be stored as data within the Customer entity.
There may be a couple of reasons to do this. You're only dealing with one entity, so your data fetch will be fully-consistent. You also have reduced write costs when you have to update a bunch of purchases, since you're only writing a single entity. And you can still query on the properties of the Purchase class. However, this was designed for only having a limited number or repeated objects, not hundreds or thousands. And each entity is limited to ta total size of 1MB, so you'll eventually hit that and you won't be able to add more purchases.
(from your personal tags I assume you are a java guy, using GAE+java)
First, don't use the ancestor relationships - this has a special purpose to define the transaction scope (aka Entity Groups). It comes with several limitations and should not be used for normal relationships between entities.
Second, do use an ORM instead of low-level API: my personal favourite is objectify. GAE also offers JDO or JPA.
In GAE relations between entities are simply created by storing a reference (a Key) to an entity inside another entity.
In your case there are two possibilities to create one-to-many relationship between Customer and it's Purchases.
public class Customer {
#Id
public Long customerId; // 'Long' identifiers are autogenerated
// first option: parent-to-children references
public List<Key<Purchase>> purchases; // one-to-many parent-to-child
}
public class Purchase {
#Id
public Long purchaseId;
// option two: child-to-parent reference
public Key<Customer> customer;
}
Whether you use option 1 or option 2 (or both) depends on how you plane to access the data. The difference is whether you use get or query. The difference between two is in cost and speed, get being always faster and cheaper.
Note: references in GAE Datastore are manual, there is no referential integrity: deleting one part of a relationship will produce no warning/error from Datastore. When you remove entities it's up to your code to fix references - use transactions to update two entities consistently (hint: no need to use Entity Groups - to update two entities in a transaction you can use XG transactions, enabled by default in objectify).
I think the best approach in this specific case would be to use a parent structure.
class Customer(ndb.Model):
pass
class Purchase(ndb.Model):
pass
customer = Customer()
customer_key = customer.put()
purchase = Purchase(parent=customer_key)
You could then get all purchases of a customer using
purchases = Purchase.query(ancestor=customer_key)
or get the customer who bough the purchase using
customer = purchase.key.parent().get()
It might be a good idea to keep track of the purchase count indeed when you use that value a lot.
You could do that using a _pre_put_hook or _post_put_hook
class Customer(ndb.Model):
count = ndb.IntegerProperty()
class Purchase(ndb.Model):
def _post_put_hook(self):
# TODO check whether this is a new entity.
customer = self.key.parent().get()
customer.count += 1
customer.put()
It would also be good practice to do this action in a transacion, so the count is reset when putting the purchase fails and the other way around.
#ndb.transactional
def save_purchase(purchase):
purchase.put()

Google App Engine Entity Ownership

I am writing an app for GAE in Python which stores recipes for different users. I have an entity called User in the datastore and an entity called Recipe. I want to be able to set the owner of each Recipe to the User who created it. Also, I want each User entity to contain a list of all Recipes belonging to that User as well as being able to query the Recipe database to find all Recipes owned by a particular User.
What is the best way to go about creating this parent/child type relationship?
Thanks
There are two main ways. (I am going to assume your using python which defines examples)
Option 1. Make the User the ancestor of all of their recipes
recipe = Recipe(parent=user.key)
Option 2. Use key property
class Recipe(ndb.Model):
owner = ndb.KeyProperty()
recipe = Recipe(owner=user.key)
all recipes for user with option 1
recipes = Recipe.query(ancestor=user.key)
all recupes for user with option 2
recipes = Recipe.query().filter(Recipe.owner == user.key)
Which one you use really depends a lot on what you plan to do with the data after creation, transaction patterns etc.... You should elaborate on your use cases. Both will work.
Also you should read up on transactions entity groups and understand them to really determine if you want to use ancestors https://developers.google.com/appengine/docs/java/datastore/transactions?hl=en .
If you use db.Model, to model one-to-many relationship, you can use the RefernenceProperty constructor and specify a collection_name. For example, one book may have many reviews.
class Book(db.Model):
title = db.StringProperty()
author = db.StringProperty()
class BookReview(db.Model):
book = db.ReferenceProperty(Book, collection_name='reviews')
b = Book()
b.put()
br = BookReview()
br.book = b # sets br's 'book' property to b's key
for review in b.reviews:# use collection_name to retrieve all reviews for a book
....
see https://developers.google.com/appengine/docs/python/datastore/datamodeling#references
Alternatively, you can use ndb's KeyProperty as in Tim's answer.
Also see
db.ReferenceProperty() vs ndb.KeyProperty in App Engine

One-to-Many relationship in ndb

I am reading up on Google app engine and preparing a sample to understand it better.
In a nutshell the user can record an entry for every day in the month, like a calendar.
And the user can view the entries on monthly basis. So no more than 30 ish at a time.
Initially I had used db and the one-to-many relationship was straight forward.
But once I came across the ndb, I realized there are two ways of modelling a one-to-many relationship.
1) The structured property seems to act like a repeated property on the User model. Does it mean if I retrieve one user, I would automatically retrieve all the records she has entered? (e.g. the entire year) This isn't very efficient though, is it? I guess the the advantage is that you get all related data in one read operation.
from google.appengine.ext import ndb
class User(UserMixin, ndb.Model):
email = ndb.StringProperty(required = True)
password_hash = ndb.TextProperty(required = True)
record = ndb.StructuredProperty(Record, repeated=True)
class Record(ndb.Model):
notes = ndb.TextProperty()
2) Alternatively I could use perhaps the more classic way:
class User(UserMixin, ndb.Model):
email = ndb.StringProperty(required = True)
password_hash = ndb.TextProperty(required = True)
class Record(ndb.Model):
user = ndb.KeyProperty(kind=User)
notes = ndb.TextProperty()
Which way is the better way in my use case?
The downside of using StructuredProperty instead of KeyProperty is that with StructuredProperty the limit on total entity size (1MB) applies to the sum of the User and all Records it contains (because the structured properties are serialized as part of the User entity). With KeyProperty, each Record has a 1MB limit by itself.

de-normalizing data model: django/sql -> app engine

I'm just starting to get my head around non-relational databases, so I'd like to ask some help with converting these traditional SQL/django models into Google App Engine model(s).
The example is for event listings, where each event has a category, belongs to a venue, and a venue has a number of photos attached to it.
In django, I would model the data like this:
class Event(models.Model)
title = models.CharField()
start = models.DatetimeField()
category = models.ForeignKey(Category)
venue = models.ForeignKey(Venue)
class Category(models.Model):
name= models.CharField()
class Venue (models.Model):
name = models.CharField()
address = models.CharField()
class Photo(models.Model):
venue = models.ForeignKey(Venue)
source = models.CharField()
How would I accomplish the equivalent with App Engine models?
There's nothing here that must be de-normalized to work with App Engine. You can change ForeignKey to ReferenceProperty, CharField to StringProperty and DatetimeField to DateTimeProperty and be done. It might be more efficient to store category as a string rather than a reference, but this depends on usage context.
Denormalization becomes important when you start designing queries. Unlike traditional SQL, you can't write ad-hoc queries that have access to every row of every table. Anything you want to query for must be satisfied by an index. If you're running queries today that depend on table scans and complex joins, you'll have to make sure that the query parameters are indexed at write-time instead of calculating them on the fly.
As an example, if you wanted to do a case-insensitive search by event title, you'd have to store a lower-case copy of the title on every entity at write time. Without guessing your query requirements, I can't really offer more specific advice.
It's possible to run Django on App Engine
You need a trio of apps from here:
http://www.allbuttonspressed.com/projects
Django-nonrel
djangoappengine
djangotoolbox
Additionally, this module makes it possible to do the joins across Foreign Key relationships which are not directly supported by datastore methods:
django-dbindexer
...it denormalises the fields you want to join against, but has some limitations - doesn't update the denormalised values automatically so is only really suitable for static values
Django signals provide a useful starting point for automatic denormalisation.

Resources