Cascade delete in Google datastore for ReferenceProperty objects - google-app-engine

Is there a concept of having the Datastore in Google App Engine carry out cascade deletes where ReferenceProperty has been used? I understand than the Datastore in GAE is not a relational database. However, consider a simple model where blog posts can be liked by users.
class Post(db.Model):
subject = db.StringProperty(required=True)
content = db.TextProperty(required=True)
created = db.DateTimeProperty(auto_now_add=True)
created_by = db.ReferenceProperty(User, required=True,
collection_name='posts')
and:
class Like(db.Model):
post = db.ReferenceProperty(Post, required=True, collection_name='likes')
user = db.ReferenceProperty(User, required=True, collection_name='likes')
When it comes to deleting a post, I want all "likes" to be deleted also.
def delete(self, post_key):
""" Deletes a post from the datastore """
db.delete(post_key)
# TODO: Should really delete any corresponding likes
# and comments too (else they're be orphaned)
So, must I code these deletions of likes myself, or can GAE do it automatically?
Thanks for any assistance that anyone could provide to increase my understanding.

There is no cascaded/recursive delete in the datastore, you have to implement it yourself.
These might help (same goal, different reason):
How to delete an entity including all children
Recursive delete in google app engine

Related

GAE: NDB query on list

Google App Engine NDB data model like so:
Users
Username
FirstName
LastName
Posts
PostID
PosterUsername
SubscribedPosts
PostID
SubscriberUsername
For a specific user, I want to return all the Posts which the user is subscribed to and display them on the page.
Since the wonderful NDB doesn't support JOINs, we do two queries:
postIDList =
SubscribedPosts.query(SubscribedPosts.SubscriberUsername == 'johndoe').fetch()
This gives us a list of SubscribedPosts. So how do I take my postIDList list and use it as a filter criteria for a Posts query?
Something like:
results = Posts.query(Post.PostID IN postIDList.PostID)
In a normal relational database, this would be a simple query using table joins. How is this done in Google's ndb?
You are going to run into lots of bottlenecks if you try to design your datastore models the same way you would design tables in a relational database as you have in this example.
Your comment goes in one possible right direction, although there are other solutions. Going that route, I would drop the "subscribedPosts" model altogether use a repeated KeyProperty entity in the User model to store subscribed posts.
See this related post: One-To-Many Example in NDB
Seems you are looking to model a many-to-many relationship, not one-to-many. Read Modelling Entity Relationships (althought this is for the older db, not newer ndb, it still gives the idea).
One of the two entities should maintain a list of keys (repeated=True) of the related other entities. Which entity should hold the list? Preferably the list should be on the side that usually has fewer relationships so that the list of keys is smaller. Another consideration is which side likely has less contention for updates.
In your specific case, lets say on average users subscribe to 10 posts and lets say on average each post has 100 users subscribed to it. In this case, we would want to put the list of keys on Users side of the relation.
class Users(ndb.Model):
user_name = ndb.StringProperty()
first_name = ndb.StringProperty()
last_name = ndb.StringProperty()
posts = ndb.KeyProperty(kind='Posts', repeated=True)
class Posts(ndb.Model)
post_id = ndb.StringProperty()
poster_user_name = ndb.StringProperty()
Establish the relationship by adding to the list in the Users instance:
current_user.posts.append(current_post.key)
For a given Users instance, getting all subscribed Posts is easy since the list of keys of the subscribed Posts is already within the given Users:
ndb.get_multi(given_user.posts)
For a given Posts instance, get all subscribing Users by ...
query = Users.query(Users.posts == given_post.key)

Google App Engine Entity Ownership

I am writing an app for GAE in Python which stores recipes for different users. I have an entity called User in the datastore and an entity called Recipe. I want to be able to set the owner of each Recipe to the User who created it. Also, I want each User entity to contain a list of all Recipes belonging to that User as well as being able to query the Recipe database to find all Recipes owned by a particular User.
What is the best way to go about creating this parent/child type relationship?
Thanks
There are two main ways. (I am going to assume your using python which defines examples)
Option 1. Make the User the ancestor of all of their recipes
recipe = Recipe(parent=user.key)
Option 2. Use key property
class Recipe(ndb.Model):
owner = ndb.KeyProperty()
recipe = Recipe(owner=user.key)
all recipes for user with option 1
recipes = Recipe.query(ancestor=user.key)
all recupes for user with option 2
recipes = Recipe.query().filter(Recipe.owner == user.key)
Which one you use really depends a lot on what you plan to do with the data after creation, transaction patterns etc.... You should elaborate on your use cases. Both will work.
Also you should read up on transactions entity groups and understand them to really determine if you want to use ancestors https://developers.google.com/appengine/docs/java/datastore/transactions?hl=en .
If you use db.Model, to model one-to-many relationship, you can use the RefernenceProperty constructor and specify a collection_name. For example, one book may have many reviews.
class Book(db.Model):
title = db.StringProperty()
author = db.StringProperty()
class BookReview(db.Model):
book = db.ReferenceProperty(Book, collection_name='reviews')
b = Book()
b.put()
br = BookReview()
br.book = b # sets br's 'book' property to b's key
for review in b.reviews:# use collection_name to retrieve all reviews for a book
....
see https://developers.google.com/appengine/docs/python/datastore/datamodeling#references
Alternatively, you can use ndb's KeyProperty as in Tim's answer.
Also see
db.ReferenceProperty() vs ndb.KeyProperty in App Engine

One-to-Many relationship in ndb

I am reading up on Google app engine and preparing a sample to understand it better.
In a nutshell the user can record an entry for every day in the month, like a calendar.
And the user can view the entries on monthly basis. So no more than 30 ish at a time.
Initially I had used db and the one-to-many relationship was straight forward.
But once I came across the ndb, I realized there are two ways of modelling a one-to-many relationship.
1) The structured property seems to act like a repeated property on the User model. Does it mean if I retrieve one user, I would automatically retrieve all the records she has entered? (e.g. the entire year) This isn't very efficient though, is it? I guess the the advantage is that you get all related data in one read operation.
from google.appengine.ext import ndb
class User(UserMixin, ndb.Model):
email = ndb.StringProperty(required = True)
password_hash = ndb.TextProperty(required = True)
record = ndb.StructuredProperty(Record, repeated=True)
class Record(ndb.Model):
notes = ndb.TextProperty()
2) Alternatively I could use perhaps the more classic way:
class User(UserMixin, ndb.Model):
email = ndb.StringProperty(required = True)
password_hash = ndb.TextProperty(required = True)
class Record(ndb.Model):
user = ndb.KeyProperty(kind=User)
notes = ndb.TextProperty()
Which way is the better way in my use case?
The downside of using StructuredProperty instead of KeyProperty is that with StructuredProperty the limit on total entity size (1MB) applies to the sum of the User and all Records it contains (because the structured properties are serialized as part of the User entity). With KeyProperty, each Record has a 1MB limit by itself.

de-normalizing data model: django/sql -> app engine

I'm just starting to get my head around non-relational databases, so I'd like to ask some help with converting these traditional SQL/django models into Google App Engine model(s).
The example is for event listings, where each event has a category, belongs to a venue, and a venue has a number of photos attached to it.
In django, I would model the data like this:
class Event(models.Model)
title = models.CharField()
start = models.DatetimeField()
category = models.ForeignKey(Category)
venue = models.ForeignKey(Venue)
class Category(models.Model):
name= models.CharField()
class Venue (models.Model):
name = models.CharField()
address = models.CharField()
class Photo(models.Model):
venue = models.ForeignKey(Venue)
source = models.CharField()
How would I accomplish the equivalent with App Engine models?
There's nothing here that must be de-normalized to work with App Engine. You can change ForeignKey to ReferenceProperty, CharField to StringProperty and DatetimeField to DateTimeProperty and be done. It might be more efficient to store category as a string rather than a reference, but this depends on usage context.
Denormalization becomes important when you start designing queries. Unlike traditional SQL, you can't write ad-hoc queries that have access to every row of every table. Anything you want to query for must be satisfied by an index. If you're running queries today that depend on table scans and complex joins, you'll have to make sure that the query parameters are indexed at write-time instead of calculating them on the fly.
As an example, if you wanted to do a case-insensitive search by event title, you'd have to store a lower-case copy of the title on every entity at write time. Without guessing your query requirements, I can't really offer more specific advice.
It's possible to run Django on App Engine
You need a trio of apps from here:
http://www.allbuttonspressed.com/projects
Django-nonrel
djangoappengine
djangotoolbox
Additionally, this module makes it possible to do the joins across Foreign Key relationships which are not directly supported by datastore methods:
django-dbindexer
...it denormalises the fields you want to join against, but has some limitations - doesn't update the denormalised values automatically so is only really suitable for static values
Django signals provide a useful starting point for automatic denormalisation.

GQL + Join Table Query Replacement for Google App Engine Datastore

Given the following Many to Many Relationship designed in Google App Engine Datastore:
User
PK: UserID
Name
Company
PK: CompanyID
Name
CompanyReview
CK CompanyID
CK UserID
ReviewContent
For optimization query, what's the best way to query this relationship tables for showing the selected company's review by users.
Currently, I'm doing the following:
results = CompanyReview.all().filter('owned_by = ', company).filter('written_by = ', user).fetch(10)
where I'm able to retrieve the data of CompanyReview table. However, in this case, I would need to check against the UserID from this CompanyReview table against the User table in order to obtain the name of the users who have commented for the selected company.
Is there a better solution to grab the user name as well, all in one statement in this case or at least better optimized solution? Performance is emphasized.
It dependes on which side of the relationship will have more values. As described is this article of Google App Engine docs, you can model many-to-many relationships by using a list of keys in one side of the relationship. "This means you should place the list on side of the relationship which you expect to have fewer values".
If both sides of the relationship will have many values, then you will really need the CompanyReview model. But pay attention to what the article says:
However, you need to be very careful because traversing the
connections of a collection will require more calls to the datastore.
Use this kind of many-to-many relationship only when you really need
to, and do so with care to the performance of your application.
This is because it uses RefereceProperty in the relationship model:
class ContactCompany(db.Model):
contact = db.ReferenceProperty(Contact,
required=True,
collection_name='companies')
company = db.ReferenceProperty(Company,
required=True,
collection_name='contacts')
title = db.StringProperty()
So if in Contact entities we try to access the companies, it will make a new query. And if in ContactCompany entities we try to get attributes of contact as in contact_company.contact.name, a query for that single contact will be made also. Read the ReferencyProperty docs for more info.
Extra:
Since you are performance-savvy, I recommend using a decorator for memcaching function returns and using this excellent layered storage library for Google App Engine.

Resources