Maintain uniqueness of a property in the NDB database - google-app-engine

An NDB model contains two properties: email and password. How to avoid adding to the database two records with the same email? NDB doesn't have UNIQUE option for a property, like relational databases do.
Checking that new email is not in the database before adding—won't satisfy me, because two parallel processes can both simultaneously do the checking and each add the same email.
I'm not sure that transactions can help here, I am under this impression after reading some of the manuals. Maybe the synchronous transactions? Does it mean one at a time?

Create the key of the entity by email, then use get_or_insert to check if exists.
Also read about keys , entities. and models
#ADD
key_a = ndb.Key(Person, email);
person = Person(key=key_a)
person.put()
#Insert unique
a = Person.get_or_insert(email)
or if you want to just check
#ADD
key_a = ndb.Key(Person, email);
person = Person(key=key_a)
person.put()
#Check if it's added
new_key_a =ndb.Key(Person, email);
a = new_key_a.get()
if a is not None:
return
Take care. Changing email will be really difficult (need to create new entry and copy all entries to new parent).
For that thing maybe you need to store the email, in another entity and have the User be the parent of that.
Another way is to use Transactions and check the email property. Transaction's work in the way: First that commits is the First that wins. A concept which means that if 2 users check for email only the first (lucky) one will succeed, thus your data will be consistent.

Maybe you are looking for the webapp2-authentication module, that can handle this for you. It can be imported like this import webapp2_extras.appengine.auth.models. Look here for a complete example.

I also ran into this problem, and the solution above didn't solve my problem:
making it a key was unacceptable in my case (i need the property to be changeable in the future)
using transactions on the email property doesn't work AFAIK (you can't do queries on non-key names inside transactions, so you can't check whether the e-mail already exists).
I ended up creating a separate model with no properties, and the unique property (email address) as the key name. In the main model, I store a reference to the email model (instead of storing the email as a string). Then, I can make 'change_email' a transaction that checks for uniqueness by looking up the email by key.

This is something that I've come across as well and I settled on a variation of #Remko's solution. My main issue with checking for an existing entity with the given email is a potential race condition like op stated. I added a separate model that uses an email address as the key and has a property that holds a token. By using get_or_insert, the returned entities token can be checked against the token passed in and if they match then the model was inserted.
import os
from google.appengine.ext import ndb
class UniqueEmail(ndb.Model):
token = ndb.StringProperty()
class User(ndb.Model):
email = ndb.KeyProperty(kind=UniqueEmail, required=True)
password = ndb.StringProperty(required=True)
def create_user(email, password):
token = os.urandom(24)
unique_email = UniqueEmail.get_or_insert(email,
token=token)
if token == unique_email.token:
# If the tokens match, that means a UniqueEmail entity
# was inserted by this process.
# Code to create User goes here.
# The tokens do not match, therefore the UniqueEmail entity
# was retrieved, so the email is already in use.
raise ValueError('That user already exists.')

I implemented a generic structure to control unique properties. This solution can be used for several kinds and properties. Besides, this solution is transparent for other developers, they use NDB methods put and delete as usual.
1) Kind UniqueCategory: a list of unique properties in order to group information. Example:
‘User.nickname’
2) Kind Unique: it contains the values of each unique property. The key is the own property value which you want to control of. I save the urlsafe of the main entity instead of the key or key.id() because is more practical and it doesn’t have problem with parent and it can be used for different kinds. Example:
parent: User.nickname
key: AVILLA
reference_urlsafe: ahdkZXZ-c3RhcnQtb3BlcmF0aW9uLWRldnINCxIEVXNlciIDMTIzDA (User key)
3) Kind User: for instance, I want to control unique values for email and nickname. I created a list called ‘uniqueness’ with the unique properties. I overwritten method put in transactional mode and I wrote the hook _post_delete_hook when one entity is deleted.
4) Exception ENotUniqueException: custom exception class raised when some value is duplicated.
5) Procedure check_uniqueness: check whether a value is duplicated.
6) Procedure delete_uniqueness: delete unique values when the main entity is deleted.
Any tips or improvement are welcome.
class UniqueCategory(ndb.Model):
# Key = [kind name].[property name]
class Unique(ndb.Model):
# Parent = UniqueCategory
# Key = property value
reference_urlsafe = ndb.StringProperty(required=True)
class ENotUniqueException(Exception):
def __init__(self, property_name):
super(ENotUniqueException, self).__init__('Property value {0} is duplicated'.format(property_name))
self. property_name = property_name
class User(ndb.Model):
# Key = Firebase UUID or automatically generated
firstName = ndb.StringProperty(required=True)
surname = ndb.StringProperty(required=True)
nickname = ndb.StringProperty(required=True)
email = ndb.StringProperty(required=True)
#ndb.transactional(xg=True)
def put(self):
result = super(User, self).put()
check_uniqueness (self)
return result
#classmethod
def _post_delete_hook(cls, key, future):
delete_uniqueness(key)
uniqueness = [nickname, email]
def check_uniqueness(entity):
def get_or_insert_unique_category(qualified_name):
unique_category_key = ndb.Key(UniqueCategory, qualified_name)
unique_category = unique_category_key.get()
if not unique_category:
unique_category = UniqueCategory(id=qualified_name)
unique_category.put()
return unique_category_key
def del_old_value(key, attribute_name, unique_category_key):
old_entity = key.get()
if old_entity:
old_value = getattr(old_entity, attribute_name)
if old_value != new_value:
unique_key = ndb.Key(Unique, old_value, parent=unique_category_key)
unique_key.delete()
# Main flow
for unique_attribute in entity.uniqueness:
attribute_name = unique_attribute._name
qualified_name = type(entity).__name__ + '.' + attribute_name
new_value = getattr(entity, attribute_name)
unique_category_key = get_or_insert_unique_category(qualified_name)
del_old_value(entity.key, attribute_name, unique_category_key)
unique = ndb.Key(Unique, new_value, parent=unique_category_key).get()
if unique is not None and unique.reference_urlsafe != entity.key.urlsafe():
raise ENotUniqueException(attribute_name)
else:
unique = Unique(parent=unique_category_key,
id=new_value,
reference_urlsafe=entity.key.urlsafe())
unique.put()
def delete_uniqueness(key):
list_of_keys = Unique.query(Unique.reference_urlsafe == key.urlsafe()).fetch(keys_only=True)
if list_of_keys:
ndb.delete_multi(list_of_keys)

Related

SQLAlchemy: foreignKeys from multiple Tables (Many-to-Many)

I'm using flask-sqlalchemy orm in my flask app which is about smarthome sensors and actors (for the sake of simplicity let's call them Nodes.
Now I want to store an Event which is bound to Nodes in order to check their state and other or same Nodes which should be set with a given value if the state of the first ones have reached a threshold.
Additionally the states could be checked or set from/for Groups or Scenes. So I have three diffrent foreignkeys to check and another three to set. All of them could be more than one per type and multiple types per Event.
Here is an example code with the db.Models and pseudocode what I expect to get stored in an Event:
db = SQLAlchemy()
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Scene(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
# The following columns may be in a intermediate table
# but I have no clue how to design that under these conditions
constraints = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with threshold per key
target = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with target values per key
In the end I want to be able to check if any of my Events are true to set the bound Node/Group/Scene accordingly.
It may be a database design problem (and not sqlalchemy) but I want to make use of the advantages of sqla orm here.
Inspired by this and that answer I tried to dig deeper, but other questions on SO were about more specific problems or one-to-many relationships.
Any hints or design tips are much appreciated. Thanks!
I ended up with a trade-off between usage and lines of code. My first thought here was to save as much code as I can (DRY) and defining as less tables as possible.
As SQLAlchemy itself points out in one of their examples the "generic foreign key" is just supported because it was often requested, not because it is a good solution. With that less db functionallaty is used and instead the application has to take care about key constraints.
On the other hand they said, having more tables in your database does not affected db performance.
So I tried some approaches and find a good one that fits to my usecase. Instead of a "normal" intermediate table for many-to-many relationships I use another SQLAlchemy class which has two one-to-many relations on both sides to connect two tables.
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
noodles = db.relationship('NoodleEvent', back_populates='events')
# columns snipped out
def get_as_dict(self):
return {
"id": self.id,
"nodes": [n.get_as_dict() for n in self.nodes]
}
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
events = db.relationship('NodeEvent', back_populates='node')
# columns snipped out
class NodeEvent(db.Model):
ev_id = db.Column('ev_id', db.Integer, db.ForeignKey('event.id'), primary_key=True)
n_id = db.Column('n_id', db.Integer, db.ForeignKey('node.id'), primary_key=True)
value = db.Column('value', db.String(200), nullable=False)
compare = db.Column('compare', db.String(20), nullable=True)
node = db.relationship('Node', back_populates="events")
events = db.relationship('Event', back_populates="nodes")
def get_as_dict(self):
return {
"trigger_value": self.value,
"actual_value": self.node.status,
"compare": self.compare
}
The trade-off is that I have to define a new class everytime I bind a new table on that relationship. But with the "generic foreign key" approach I also would have to check from where the ForeignKey is comming from. Same work in the end of the day.
With my get_as_dict() function I have a very handy access to the related data.

Convert three modelsto one single query django query

This are my model with some of the fields:
class Advertisers(models.Model):
account_manager_id = models.ForeignKey(AccountManagers, on_delete=models.CASCADE,null=True, db_column='account_manager_id',related_name="advertisers")
class AdvertiserUsers(models.Model):
user_id = models.OneToOneField('Users', on_delete=models.CASCADE,null=True,db_column='user_id', related_name='advertiser_users')
advertiser_id = models.ForeignKey('Advertisers', on_delete=models.CASCADE,null=True,db_column='advertiser_id', related_name='advertiser_users')
class Users(models.Model):
email = models.CharField(unique=True, max_length=100)
I want Id's, user ids and email of all advertisers.
Id's of all user:-
advertiser_ids = advertisers.objects.all() # can get id from here
find user_ids of advertiser_ids:
user_ids = AdvertiserUsers.objects.filter(advertiser_id__in=advertiser_ids) # can get user_id from here
find id and email using this query:
user_ids = Users.objects.filter(id__in=user_ids) # can get email from here
How to make it shorter like directly querying from Advertisers i will be able to get Users models email.
Thankyou in advance
You can filter with:
Users.objects.filter(advertiser_users__advertiser_id__isnull=False).distinct()
The .distinct() [Django-doc] will prevent returning the same Users multiple times.
You can annotate the User objects with the Advertisers primary key, etc:
from django.db.models import F
Users.objects.filter(advertiser_users__advertiser_id__isnull=False).annotate(
account_manager_id=F('advertiser_users__advertiser_id__account_manager_id'),
advertiser_id=F('advertiser_users__advertiser_id')
)
The Users objects that arise from this have a .email attribute (and the other attributes that belong to a Users object), together with a .account_manager_id and an .advertiser_id. That being said, this is probably not a good idea: the way you have modeled this right now, is that a Users object can relate to multiple Advertisers objects, so it makes not much sense to add these together.
You can for each user access the related Advertisers with:
myusers = Users.objects.filter(
advertiser_users__advertiser_id__isnull=False
).prefetch_related(
'advertiser_users',
'advertiser_users__advertiser_id'
).distinct()
for user in myusers:
print(f'{user.email}')
for advuser in user.advertiser_users.all():
print(f' {advuser.advertiser_user.pk}')
Note: normally a Django model is given a singular name, so User instead of Users.
Note: Normally one does not add a suffix _id to a ForeignKey field, since Django
will automatically add a "twin" field with an _id suffix. Therefore it should
be account_manager_id, instead of account_manager.
Advertisers.objects.all().values_list('id','account_manager_id','advertiser_users__user_id',advertiser_users__user_id__email)

Google app engine: how to handle concurrency (racing condition)

I am trying to solve the racing problem based on this to prevent duplicate user registrations. So if the account exists or the email has been used, no entity will be created.
#ndb.transactional
def get_or_insert2(account, email):
accountExists, emailExists = False, False
entity = Member.get_by_id(account)
if entity is not None:
accountExists = True
if Member.query(Member.email==email).fetch(1):
emailExists = True
if not accountExists and not emailExists:
entity = Member(id=account)
entity.put()
return (entity, accountExists, emailExists)
My questions:
I got an error message: BadRequestError: Only ancestor queries are allowed inside transactions. what was the problem?
Is the code correct? I mean, can it really solve the racing problem?
Thanks.
Transactions work on entity groups, and you can include up to 5 entity groups in a cross group transaction. An entity group is handled by a single server (or group, replicated), which means it is able to have consistent internal state when checking data or doing ancestor queries within the entity group.
Regular queries are global, on indexes with eventual consistency. You don't know when all changes from all nodes have been included in an index. You can't lock up the entire datastore to get consistent snapshot state for your transaction. This is a key difference from a regular RDBMS if you're used to consistent index for queries.
For 1), the problem is that you're doing a regular query inside a transaction, which doesn't work as explained above. The answer to 2) then becomes no, query can't solve racing problem, you need explicit gets.
You will need a Model for Member, Email and SSN. This is a quick untested example that hopefully gets you going:
class Member(ndb.Model):
email = ndb.KeyProperty()
ssn = ndb.KeyProperty()
# More user properties goes here...
class Email(ndb.Model):
member = ndb.KeyProperty()
class SSN(ndb.Model):
member = ndb.KeyProperty()
#ndb.tasklet
def get_or_insert2(account, email, ssn):
created = False
member_key = ndb.Key(Member, account)
email_key = ndb.Key(Email, email)
ssn_key = ndb.Key(SSN, ssn)
member_obj, email_obj, ssn_obj = yield ndb.get_multi_async([member_key, email_key, ssn_key])
if member_obj is None and email_obj is None and ssn_obj is None:
member_obj = Member(key=member_key, email=email_key, ssn=ssn_key))
email_obj = Email(key=email_key, member=member_key)
ssn_obj = SSN(key=ssn_key, member=member_key)
yield ndb.put_multi_async([member_obj, email_obj])
created = True
raise ndb.Return([created, member_obj, email_obj, ssn_obj])
outcome = ndb.transaction(lambda: get_or_insert2(account, email, ssn), xg=True)
I'm not sure if it works to combine #ndb.tasklet and #ndb.transactional(xg=True) decorators, and if so, which order, just try it out.
If you need to query User based on email or ssn, you could for example rename the KeyProperties to *_ref and make something like
#ndb.ComputedProperty
def email(self):
return self.email_ref.id()
While this ends up being more lines of code than you anticipated, it is conceptually simple and straight forward, and you can easily figure out what's going on when you get back to it later.

google app engine: concurrent user registrations

I know this is a classical problem, but I still don't know how to do it. On Google App Engine, I have a member registration form which uses jQuery's validation to check if a username exists.
There of course is a concurrency problem: several users try to register, enter the same username, Validation finds the username available, and allow them to press "Add" at the approximately same time. Validation wouldn't detect this. In my application, username, email, and Personal ID should all be unique. How do I prevent the following code from having the concurrency problem:
member = Member()
member.username = self.request.get('username')
member.Pid = self.request.get('Pid')
member.email = self.request.get('email')
...
As the uniqueness constraint is on username, you have to use it as key in datastore and use transactions.
def txn():
key = ndb.Key(Member, username)
member = key.get()
if member is not None:
raise CustomAlreadyExistsException(member) # This will abort txn
member = Member(
id=username,
Pid=self.request.get('Pid'),
email=self.request.get('email'),
...)
member.put()
ndb.transaction(txn)
This makes sure only one person can register a username.
The jQuery helper would check if ndb.Key(Member, userid).get() gives a result or not. The GET is not transactional.
To improve usability client side in "reserving" a username after checking availability, you could use memcached as suggested by Daniel, but I'd call YAGNI, skip the complexity and rather let some people get validation error after submitting the form. Note that memcached is best effort and has no guarantees about anything.
If you need guaranteed uniqueness on multiple fields, you have to add Model classes for them and check in a cross group (XG) transaction.
class Pid(ndb.Model):
member = ndb.KeyProperty()
class Email(ndb.Model):
member = ndb.KeyProperty()
class Member(ndb.Model):
pid = ndb.KeyProperty()
email = ndb.KeyProperty()
#property
def pid_value(self):
return self.pid.id()
#property
def email_value(self):
return self.email.id()
def txn():
member_key = ndb.Key(Member, username)
pid_key = ndb.Key(PersonalId, self.request.get('Pid'))
email_key = ndb.Key(Email, self.request.get('email'))
member, pid, email = ndb.get_multi([member_key, pid_key, email_key])
if member is not None or pid is not None or email is not None:
raise CustomAlreadyExistsException(member, pid, email) # This will abort txn
# Create instances referencing each other
email = Email(key=email_key, member=member_key)
pid = Pid(key=pid_key, member=member_key)
member = Member(
key=member_key,
pid=pid_key,
email=email_key,
...)
ndb.put_multi([member, pid, email])
ndb.transaction(txn, xg=True)
This is a great use for memcache. Your Ajax validation function should put an entry into memcache to record that the username has been requested. It should also check both memcache and the datastore to ensure that the username is free. Similarly, the registration code should check memcache to ensure that the current user is the one who requested the username.
This nicely solves your concurrency problem, and the best thing is that entries in memcache expire by themselves, either on a timed basis or when the cache gets too full.
I agreed with tesdal.
If you still want to implement the memcache tric sugested by Daniel, you should do something like "memcache.add(usernameA, dummy value, short period);". So you know that usernameA is reserved for a short period and wont conflict with "memcache.add(usernameB, ..."

How to create a query for matching keys?

I use the key of another User, the sponsor, to indicate who is the sponsor of a User and it creates a link in the datastore for those Users that have a sponsor and it can be at most one but a sponsor can sponsor many users like in this case ID 2002 who sponsored three other users:
In this case this query does what I want: SELECT * FROM User where sponsor =KEY('agtzfmJuYW5vLXd3d3ILCxIEVXNlchjSDww') but I don't know how to program that with python, I can only use it to the datastore. How can I query by key when I want to match the set of users who has the same user as key in the same field? A user in my model can have at most one sponsor and I just want to know who a particular person sponsored which could be a list of users and then they sponsored users in their turn which I also want to query on.
The field sponsor is a key and it has a link to the sponsor in the datastore. I set the key just like user2.sponsor = user1.key and now I want to find all that user1 sponsored with a query that should be just like
User.All().filter('sponsor = ', user1.key)
but sponsor is a field of type key so I don't know how to match it to see for example a list a people the active user is a sponsor for and how it becomes a tree when the second generation also have links. How to select the list of users this user is a sponsor for and then the second generation? When i modelled the relation simply like u1=u2.key ie user2.sponsor=user1.key. Thanks for any hint
The following workaround is bad practice but is my last and only resort:
def get(self):
auser = self.auth.get_user_by_session()
realuser = auth_models.User.get_by_id(long( auser['user_id'] ))
q = auth_models.User.query()
people = []
for p in q:
try:
if p.sponsor == realuser.key:
people.append(p)
except Exception, e:
pass
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
Update
The issues are that the keyproperty is not required and that Guido Van Rossum has reported this as a bug in the ndb when I think it's a bug in my code. Here's what I'm using now, which is a very acceptable solution since every real user in the organization except possibly programmers, testers and admins are going the be required to have a sponsor ID which is a user ID.
from ndb import query
class Myorg(NewBaseHandler):
#user_required
def get(self):
user = auth_models.User.get_by_id(long(self.auth.get_user_by_session()['user_id']))
people = auth_models.User.query(auth_models.User.sponsor == user.key).fetch()
self.render_jinja('my_organization.html', people=people,
user=user)
class User(model.Expando):
"""Stores user authentication credentials or authorization ids."""
#: The model used to ensure uniqueness.
unique_model = Unique
#: The model used to store tokens.
token_model = UserToken
sponsor = KeyProperty()
created = model.DateTimeProperty(auto_now_add=True)
updated = model.DateTimeProperty(auto_now=True)
# ID for third party authentication, e.g. 'google:username'. UNIQUE.
auth_ids = model.StringProperty(repeated=True)
# Hashed password. Not required because third party authentication
# doesn't use password.
password = model.StringProperty()
...
The User model is an NDB Expando which is a little bit tricky to query.
From the docs
Another useful trick is querying an Expando kind for a dynamic
property. You won't be able to use class.query(class.propname ==
value) as the class doesn't have a property object. Instead, you can
use the ndb.query.FilterNode class to construct a filter expression,
as follows:
from ndb import model, query
class X(model.Expando):
#classmethod
def query_for(cls, name, value):
return cls.query(query.FilterNode(name, '=', value))
print X.query_for('blah', 42).fetch()
So try:
form ndb import query
def get(self):
auser = self.auth.get_user_by_session()
realuser = auth_models.User.get_by_id(long( auser['user_id'] ))
people = auth_models.User.query(query.FilterNode('sponsor', '=', realuser.key)).fetch()
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
Option #2
This option is a little bit cleaner. You subclass the model and pass it's location to webapp2. This will allow you to add custom attributes and custom queries to the class.
# custom_models.py
from webapp2_extras.appengine.auth.models import User
from google.appengine.ext.ndb import model
class CustomUser(User):
sponsor = model.KeyProperty()
#classmethod
def get_by_sponsor_key(cls, sponsor):
# How you handle this is up to you. You can return a query
# object as shown, or you could return the results.
return cls.query(cls.sponsor == sponsor)
# handlers.py
def get(self):
auser = self.auth.get_user_by_session()
realuser = custom_models.CustomUser.get_by_id(long( auser['user_id'] ))
people = custom_models.CustomUser.get_by_sponsor_key(realuser.key).fetch()
if auser:
self.render_jinja('my_organization.html', people=people, user=realuser,)
# main.py
config = {
# ...
'webapp2_extras.auth': {
# Tell webapp2 where it can find your CustomUser
'user_model': 'custom_models.CustomUser',
},
}
application = webapp2.WSGIApplication(routes, config=config)

Resources