Appengine ndb - How to ensure unique username and email without ancestors? - google-app-engine

In my Appengine (using ndb) application I store users and both username and email need to be unique.
I also need to be able to update progress (save level if higher than previously stored level), change email and pw and delete account.
I noticed that it is not possible to query without ancestors in a transaction. But creating an ancestor is NOT a solution since that would limit the number of writes to 1 per second which is not OK if the app gets popular. So I need another solution.
Is it possible to use the Key? Yes, but that only makes the username unique, how can I make sure noone is reusing the email for another account?

You should be able to use a cross group transaction for this along with an entity that exists solely for reserving email addresses.
For your User entity, you could use the username as the key name. When creating a user, you also create an EmailReservation entity that has the user's email address as a key name.
You then use a cross-group transaction to create a new user:
#ndb.transactional(xg=True)
def create_user(user_name, email):
user = User.get_by_id(user_name)
email_reservation = EmailReservation.get_by_id(email)
if user or email_reservation:
# Either the user_name or email is already in use so stop
return None
# Create the user and reserve the email address so others can't use it
user = User(id=user_name)
email_reservation = EmailReservation(id=email)
ndb.put_multi(user, email_reservation)
return user

Related

Change appengine ndb key

I have a game where I've (foolishly) made the db key equal to the users login email. I did this several years ago so I've got quite a few users now. Some users have asked to change their email login for my game. Is there a simple way to change the key? As far as I can tell I'd need to make a new entry with the new email and copy all the data across, then delete the old db entry. This is the user model but then I've got other models, like one for each game they are involved in, that store the user key so I'd have to loop though all of them as well and swap out for the new key.
Before I embark on this I wanted to see if anyone else had a better plan. There could be several models storing that old user key so I'm also worried about the process timing out.
It does keep it efficient to pull a db entry as I know the key from their email without doing a search, but it's pretty inflexible in hindsight
This is actually why you don't use a user's email as their key. Use ndb's default randomly generated key ids.
The efficiency you're referring is not having to query the user's email to retrieve the user id. But that only happens once on user login or from your admin screens when looking at someones account.
You should rip the bandade off now and do a schema-migration away from this model.
Create a new user model (i.e. UsersV2) and clone your existing user model into it to generate new ids.
On all models that reference it add a duplicate field user_v2 = ndb.KeyProperty(UsersV2) and populate it with the new key.
Delete the legacy user model
You should use the taskqueue to do something like this and then you won't have to worry about the process timing out:
https://cloud.google.com/appengine/articles/update_schema
Alternatively, if you are determined to do this cascading update everytime a user changes an email, you could set up a similar update_schema task for just that user.
I ended up adding a new property to my user model and running a crawler to copy the string key (the email) to that new property. I changed my code search for that property rather then the key string to get a user item. Most of my users still have keys that equal their email, but I can safely ignore them as if the string is meaningless. I can now change a users email easily without making a new recored and my other models that have pointers to these user keys can remain unchanged.

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

Consistency in a Login Model using the Google Cloud Datastore

I'm trying to get my head around a login model that uses several authentication methods.
1.) For example, when a new user tries to log in with OpenID my backend is going to insert two entities into the datastore:
Insert a new user, where the automatically inserted id will be his $userId
(kind: User, id: autoId)
Insert a new login that is linked to the $userId
(kind: AuthOpenid, name: $openId), Property(userId: $userId)
This will allow me to make lookup by key requests when a user tries to log in, which enforces strongly consistent data, right?
The idea is that one user can have many logins (like stackexchange) and I don't have to worry about write/read limits because no entities have ancestors while still enforcing consistency.
2.) On a related note: Assuming my users are allowed to pick a username once they have provided an authentication method, how do I efficiently check if a username is taken?
My idea was to insert a new entity for every picked username.
Insert a new username
(kind: Username, name: $username)
Now I can simply make a lookup by key request to see if a username is taken. As far as I know, common lookups will be stored in memcache anyways, so this should be efficient, right?
I could also reverse the procedure and just attempt to insert a username and see if it fails.
1) Your approach looks good. As you've noted, Lookup operations (lookup by key) are guaranteed to return consistent results.
You're also correct that by putting each AuthOpenid entity in its own entity group (no common ancestor), you will avoid the write throughput limit of 1 write/second on any particular entity group (there's no corresponding limit on rate of entity group creation).
2) This will also work, but you will need to execute the read and write operations as part of a transaction. This ensures that if two users try to reserve the same username, only one of them will succeed.
In Cloud Datastore, an insert mutation will fail if an entity with the same key already exists, so this will also work.
(Note that this is different from the put() operation in the App Engine Datastore which uses upsert semantics.)

Which Property type do I use to put a User's user_id() in the Datastore?

This is why I need to put user_id()'s in the Datastore:
A User value in the datastore does not
get updated if the user changes her
email address. This may be remedied in
a future release. Until then, you can
use the User value's user_id() as the
user's stable unique identifier.
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html#users_User
Within the datastore, the value is
equal to the email address plus the
user's unique ID. If the user changes
her email address, the new User value
will not equal the original User value
in datastore queries or when compared
by the app. If your app needs a stable
identifier that does not change, you
can store the unique ID separately
from the User value.
http://code.google.com/appengine/docs/python/users/userobjects.html
How do I do it?
And how did you know (or where did you look it up?)
The usual answer is "no property": If you need a stable identifier, it's probably because you're looking users up by it, in which case you should use it as the key name for the entity, like so:
class UserInfo(db.Model):
is_admin = db.BooleanProperty(required=True, default=False)
user_info = UserInfo.get_or_insert(users.get_current_user().user_id())
If you really need to store an identifier for the user elsewhere, you can use a db.StringProperty to store the User ID.

What are the fields that the user table should contain from the security/authenication perspective?

When designing user table what would be the must have fields from the security/user authentication point of view for a Web based Application (.NET and SqlServer 2005)
I came with with the following fields:
userID
username -- preferably email
passwordHash
onceUsePassword -- to indicate that the password should be changed after login
alternativeContactEmail
userStatusID -- FK to a lookup table with statuses like: active, diabled etc
dateCreated
dateUpdated
lastPasswordUpdate
lastLogon
-- and then the rest like :forename, surname etc which are not of the interest in this question
Am I missing something?
Is standard identity (INT) sufficient for userID or should the GUID be used instead (the userID is not going to be exposed anywhere)?
EDIT:
I am limited to the use of .NET 1.1
(don't ask...)
The salt info will be merged with passwordHash
the account would be unlocked by sending a temporary, single use system generated password to the user email address (hence onceUsePassword field)
Why not just use the built-in SQL Membership Provider if you're using SQL Server anyway? It's much better than rolling your own since it's been tested by a lot of people.
In any case, you should think about adding a salt field your table.
Salting
Update:
.NET 1.1? I guess that answers my question. Is your application for the consumption of the general public? If so, you might want to add a way for them to unlock their accounts via a secret question.
onceUsePassword -- to indicate that
the password should be changed after
login
If you have to explain it that much, you should rename it. Something like "forceChangePasswordOnLogin".
You should add a "salt" field to use password salting to avoid dictionary attacks with rainbow tables if your database ever got compromised.
I'm not sure what you mean by "The salt info will be merged with passwordHash". Does that mean that the same salt is used for all password hashs? Would make more sense to generate a random salt for each hash, and store it in a separate field.

Resources