I recently came across a number of articles pointing out to flatten the data for NoSQL databases. Coming from traditional SQL databases I realized I am replicating a SQL db bahaviour in GAE. So I started to refactor code where possible.
We have e.g. a social media site where users can become friends with each other.
class Friendship(ndb.Model):
from_friend = ndb.KeyProperty(kind=User)
to_friend = ndb.KeyProperty(kind=User)
Effectively the app creates a friendship instance between both users.
friendshipA = Friendship(from_friend = UserA, to_friend = userB)
friendshipB = Friendship(from_friend = UserB, to_friend = userA)
How could I now move this to the actual user model to flatten it. I thought maybe I could use a StructuredProperty. I know it is limited to 5000 entries, but that should be enough for friends.
class User(UserMixin, ndb.Model):
name = ndb.StringProperty()
friends = ndb.StructuredProperty(User, repeated=True)
So I came up with this, however User can't point to itself, so it seems. Because I get a NameError: name 'User' is not defined
Any idea how I could flatten it so that a single User instance would contain all its friends, with all their properties?
You can't create a StructuredProperty that references itself. Also, use of StructuredProperty to store a copy of User has additional problem of needing to perform a manual cascade update if a user ever modifies a property that is stored.
However, as KeyProperty accept String as kind, you can easily store the list of Users using KeyProperty as suggested by #dragonx. You can further optimise read by using ndb.get_multi to avoid multiple round-trip RPC calls when retrieving friends.
Here is a sample code:
class User(ndb.Model):
name = ndb.StringProperty()
friends = ndb.KeyProperty(kind="User", repeated=True)
userB = User(name="User B")
userB_key = userB.put()
userC = User(name="User C")
userC_key = userC.put()
userA = User(name="User A", friends=[userB_key, userC_key])
userA_key = userA.put()
# To retrieve all friends
for user in ndb.get_multi(userA.friends):
print "user: %s" % user.name
Use a KeyProperty that stores the key for the User instance.
Related
I'm trying to figure out how I'm going to 'CRUD' the order of items I have in a group that I'm storing in a database. (Pseudo statement of: select * items from app where group_id = 1;)
My guess is that I just use an numeric field and just increase/decrease the number as more items are added to/removed from the group. I can then just update the items number in this field as they are moved around. However, I've seen this go really badly wrong in an old legacy app where items would get out of sync and you'd have a group where the order ended up something like this:
0,1,1,3,4,5
0,1,1,1,4,5
This wasn't handled very gracefully by the application either, and broke the application necessitating manual intervention to reorder the items in the DB.
Is there a way to avoid this pitfall?
EDIT: I would also maybe want the items available in multiple groups with multiple orders.
I think in that case I would need a many to many relationship for both the group to item relationship and the item to order relationship. /EDIT
I'll be doing this in the Django framework.
I'm not really sure what you are asking; because ordering is one thing, and grouping of related objects is something else entirely.
Databases don't store the order of things, but rather the relationships (grouping) of things. The order of things is a user interface detail and not something that a database should be used for.
In django, you can create a ManyToMany relationship. This essentially creates a "box" where you can add and remove items that are related to a particular model. Here is the example from the documentation:
from django.db import models
class Publication(models.Model):
title = models.CharField(max_length=30)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.headline
class Meta:
ordering = ('headline',)
Here an Article can belong to many Publications, and Publications have one or more Articles associated with them:
a = Article.create(headline='Hello')
b = Article.create(headline='World')
p = Publication.create(title='My Publication')
p.article_set.add(a)
p.article_set.add(b)
p.save()
# You can also add an article to a publication from the article object:
c = Article.create(headline='The Answer is 42')
c.publications.add(p)
To know how many articles belong to a publication:
Publication.objects.get(title='My Publication').article_set.count()
An NDB model contains two properties: email and password. How to avoid adding to the database two records with the same email? NDB doesn't have UNIQUE option for a property, like relational databases do.
Checking that new email is not in the database before adding—won't satisfy me, because two parallel processes can both simultaneously do the checking and each add the same email.
I'm not sure that transactions can help here, I am under this impression after reading some of the manuals. Maybe the synchronous transactions? Does it mean one at a time?
Create the key of the entity by email, then use get_or_insert to check if exists.
Also read about keys , entities. and models
#ADD
key_a = ndb.Key(Person, email);
person = Person(key=key_a)
person.put()
#Insert unique
a = Person.get_or_insert(email)
or if you want to just check
#ADD
key_a = ndb.Key(Person, email);
person = Person(key=key_a)
person.put()
#Check if it's added
new_key_a =ndb.Key(Person, email);
a = new_key_a.get()
if a is not None:
return
Take care. Changing email will be really difficult (need to create new entry and copy all entries to new parent).
For that thing maybe you need to store the email, in another entity and have the User be the parent of that.
Another way is to use Transactions and check the email property. Transaction's work in the way: First that commits is the First that wins. A concept which means that if 2 users check for email only the first (lucky) one will succeed, thus your data will be consistent.
Maybe you are looking for the webapp2-authentication module, that can handle this for you. It can be imported like this import webapp2_extras.appengine.auth.models. Look here for a complete example.
I also ran into this problem, and the solution above didn't solve my problem:
making it a key was unacceptable in my case (i need the property to be changeable in the future)
using transactions on the email property doesn't work AFAIK (you can't do queries on non-key names inside transactions, so you can't check whether the e-mail already exists).
I ended up creating a separate model with no properties, and the unique property (email address) as the key name. In the main model, I store a reference to the email model (instead of storing the email as a string). Then, I can make 'change_email' a transaction that checks for uniqueness by looking up the email by key.
This is something that I've come across as well and I settled on a variation of #Remko's solution. My main issue with checking for an existing entity with the given email is a potential race condition like op stated. I added a separate model that uses an email address as the key and has a property that holds a token. By using get_or_insert, the returned entities token can be checked against the token passed in and if they match then the model was inserted.
import os
from google.appengine.ext import ndb
class UniqueEmail(ndb.Model):
token = ndb.StringProperty()
class User(ndb.Model):
email = ndb.KeyProperty(kind=UniqueEmail, required=True)
password = ndb.StringProperty(required=True)
def create_user(email, password):
token = os.urandom(24)
unique_email = UniqueEmail.get_or_insert(email,
token=token)
if token == unique_email.token:
# If the tokens match, that means a UniqueEmail entity
# was inserted by this process.
# Code to create User goes here.
# The tokens do not match, therefore the UniqueEmail entity
# was retrieved, so the email is already in use.
raise ValueError('That user already exists.')
I implemented a generic structure to control unique properties. This solution can be used for several kinds and properties. Besides, this solution is transparent for other developers, they use NDB methods put and delete as usual.
1) Kind UniqueCategory: a list of unique properties in order to group information. Example:
‘User.nickname’
2) Kind Unique: it contains the values of each unique property. The key is the own property value which you want to control of. I save the urlsafe of the main entity instead of the key or key.id() because is more practical and it doesn’t have problem with parent and it can be used for different kinds. Example:
parent: User.nickname
key: AVILLA
reference_urlsafe: ahdkZXZ-c3RhcnQtb3BlcmF0aW9uLWRldnINCxIEVXNlciIDMTIzDA (User key)
3) Kind User: for instance, I want to control unique values for email and nickname. I created a list called ‘uniqueness’ with the unique properties. I overwritten method put in transactional mode and I wrote the hook _post_delete_hook when one entity is deleted.
4) Exception ENotUniqueException: custom exception class raised when some value is duplicated.
5) Procedure check_uniqueness: check whether a value is duplicated.
6) Procedure delete_uniqueness: delete unique values when the main entity is deleted.
Any tips or improvement are welcome.
class UniqueCategory(ndb.Model):
# Key = [kind name].[property name]
class Unique(ndb.Model):
# Parent = UniqueCategory
# Key = property value
reference_urlsafe = ndb.StringProperty(required=True)
class ENotUniqueException(Exception):
def __init__(self, property_name):
super(ENotUniqueException, self).__init__('Property value {0} is duplicated'.format(property_name))
self. property_name = property_name
class User(ndb.Model):
# Key = Firebase UUID or automatically generated
firstName = ndb.StringProperty(required=True)
surname = ndb.StringProperty(required=True)
nickname = ndb.StringProperty(required=True)
email = ndb.StringProperty(required=True)
#ndb.transactional(xg=True)
def put(self):
result = super(User, self).put()
check_uniqueness (self)
return result
#classmethod
def _post_delete_hook(cls, key, future):
delete_uniqueness(key)
uniqueness = [nickname, email]
def check_uniqueness(entity):
def get_or_insert_unique_category(qualified_name):
unique_category_key = ndb.Key(UniqueCategory, qualified_name)
unique_category = unique_category_key.get()
if not unique_category:
unique_category = UniqueCategory(id=qualified_name)
unique_category.put()
return unique_category_key
def del_old_value(key, attribute_name, unique_category_key):
old_entity = key.get()
if old_entity:
old_value = getattr(old_entity, attribute_name)
if old_value != new_value:
unique_key = ndb.Key(Unique, old_value, parent=unique_category_key)
unique_key.delete()
# Main flow
for unique_attribute in entity.uniqueness:
attribute_name = unique_attribute._name
qualified_name = type(entity).__name__ + '.' + attribute_name
new_value = getattr(entity, attribute_name)
unique_category_key = get_or_insert_unique_category(qualified_name)
del_old_value(entity.key, attribute_name, unique_category_key)
unique = ndb.Key(Unique, new_value, parent=unique_category_key).get()
if unique is not None and unique.reference_urlsafe != entity.key.urlsafe():
raise ENotUniqueException(attribute_name)
else:
unique = Unique(parent=unique_category_key,
id=new_value,
reference_urlsafe=entity.key.urlsafe())
unique.put()
def delete_uniqueness(key):
list_of_keys = Unique.query(Unique.reference_urlsafe == key.urlsafe()).fetch(keys_only=True)
if list_of_keys:
ndb.delete_multi(list_of_keys)
Do I pay a penalty on query performance if I choose to query repeated property? For example:
class User(ndb.Model):
user_name = ndb.StringProperty()
login_providers = ndb.KeyProperty(repeated=true)
fbkey = ndb.Key("ProviderId", 1, "ProviderName", "FB")
for entry in User.query(User.login_providers == fbkey):
# Do something with entry.key
vs
class User(ndb.Model)
user_name = ndb.StringProperty()
class UserProvider(ndb.Model):
user_key = ndb.KeyProperty(kind=User)
login_provider = ndb.KeyProperty()
for entry in UserProvider.query(
UserProvider.user_key == auserkey,
UserProvider.login_provider == fbkey
):
# Do something with entry.user_key
Based on the documentation from GAE, it seems that Datastore take care of indexing and the first less verbose option would be using the index. However, I failed to find any documentation to confirm this.
Edit
The sole purpose of UserProvider in the second example is to create a one-to-many relationship between a user and it's login_provider. I wanted to understand if it worth the trouble of creating a second entity instead of querying on repeated property. Also, assume that all I need is the key from the User.
No. But you'll raise your write costs because each entry needs to be indexed, and write costs are based on the number of indexes updated.
I use the key of another User, the sponsor, to indicate who is the sponsor of a User and it creates a link in the datastore for those Users that have a sponsor and it can be at most one but a sponsor can sponsor many users like in this case ID 2002 who sponsored three other users:
In this case this query does what I want: SELECT * FROM User where sponsor =KEY('agtzfmJuYW5vLXd3d3ILCxIEVXNlchjSDww') but I don't know how to program that with python, I can only use it to the datastore. How can I query by key when I want to match the set of users who has the same user as key in the same field? A user in my model can have at most one sponsor and I just want to know who a particular person sponsored which could be a list of users and then they sponsored users in their turn which I also want to query on.
The field sponsor is a key and it has a link to the sponsor in the datastore. I set the key just like user2.sponsor = user1.key and now I want to find all that user1 sponsored with a query that should be just like
User.All().filter('sponsor = ', user1.key)
but sponsor is a field of type key so I don't know how to match it to see for example a list a people the active user is a sponsor for and how it becomes a tree when the second generation also have links. How to select the list of users this user is a sponsor for and then the second generation? When i modelled the relation simply like u1=u2.key ie user2.sponsor=user1.key. Thanks for any hint
The following workaround is bad practice but is my last and only resort:
def get(self):
auser = self.auth.get_user_by_session()
realuser = auth_models.User.get_by_id(long( auser['user_id'] ))
q = auth_models.User.query()
people = []
for p in q:
try:
if p.sponsor == realuser.key:
people.append(p)
except Exception, e:
pass
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
Update
The issues are that the keyproperty is not required and that Guido Van Rossum has reported this as a bug in the ndb when I think it's a bug in my code. Here's what I'm using now, which is a very acceptable solution since every real user in the organization except possibly programmers, testers and admins are going the be required to have a sponsor ID which is a user ID.
from ndb import query
class Myorg(NewBaseHandler):
#user_required
def get(self):
user = auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id']))
people = auth_models.User.query(auth_models.User.sponsor == user.key).fetch()
self.render_jinja('my_organization.html', people=people,
user=user)
class User(model.Expando):
"""Stores user authentication credentials or authorization ids."""
#: The model used to ensure uniqueness.
unique_model = Unique
#: The model used to store tokens.
token_model = UserToken
sponsor = KeyProperty()
created = model.DateTimeProperty(auto_now_add=True)
updated = model.DateTimeProperty(auto_now=True)
# ID for third party authentication, e.g. 'google:username'. UNIQUE.
auth_ids = model.StringProperty(repeated=True)
# Hashed password. Not required because third party authentication
# doesn't use password.
password = model.StringProperty()
...
The User model is an NDB Expando which is a little bit tricky to query.
From the docs
Another useful trick is querying an Expando kind for a dynamic
property. You won't be able to use class.query(class.propname ==
value) as the class doesn't have a property object. Instead, you can
use the ndb.query.FilterNode class to construct a filter expression,
as follows:
from ndb import model, query
class X(model.Expando):
#classmethod
def query_for(cls, name, value):
return cls.query(query.FilterNode(name, '=', value))
print X.query_for('blah', 42).fetch()
So try:
form ndb import query
def get(self):
auser = self.auth.get_user_by_session()
realuser = auth_models.User.get_by_id(long( auser['user_id'] ))
people = auth_models.User.query(query.FilterNode('sponsor', '=', realuser.key)).fetch()
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
Option #2
This option is a little bit cleaner. You subclass the model and pass it's location to webapp2. This will allow you to add custom attributes and custom queries to the class.
# custom_models.py
from webapp2_extras.appengine.auth.models import User
from google.appengine.ext.ndb import model
class CustomUser(User):
sponsor = model.KeyProperty()
#classmethod
def get_by_sponsor_key(cls, sponsor):
# How you handle this is up to you. You can return a query
# object as shown, or you could return the results.
return cls.query(cls.sponsor == sponsor)
# handlers.py
def get(self):
auser = self.auth.get_user_by_session()
realuser = custom_models.CustomUser.get_by_id(long( auser['user_id'] ))
people = custom_models.CustomUser.get_by_sponsor_key(realuser.key).fetch()
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
# main.py
config = {
# ...
'webapp2_extras.auth': {
# Tell webapp2 where it can find your CustomUser
'user_model': 'custom_models.CustomUser',
},
}
application = webapp2.WSGIApplication(routes, config=config)
I'm trying to understand how you're supposed to access items in a GAE db.ListProperty(db.Key).
Example:
A Magazine db.Model entity has a db.ListProperty(db.Key) that contains 10 Article entities. I want to get the Magazine object and display the Article names and dates. Do I make 10 queries for the actual article objects? Do I do a batch query? What if there's 50 articles? (Don't batch queries rely on the IN operator, which is limited to 30 or fewer elements?)
So you are describing something like this:
class Magazine(db.Model):
ArticleList = db.ListProperty(db.Key)
class Article(db.Model):
ArticleName = db.StringProperty()
ArticleDate = db.DateProperty()
In this case the simplest way to grab the listed articles is to use the Model.get() method, which looks for a key list.
m = Magazine.get() #grab the first record
articles = Article.get(m.ArticleList) #get Articles using key list
for a in articles:
name = a.ArticleName
date = a.ArticleDate
#do something with this data
Depending on how you plan on working with the data you may be better off adding a Magazine reference property to your Article entities instead.
You need to read Modeling Entity Relationships, especially the part about one to many.