Consistency in a Login Model using the Google Cloud Datastore - google-app-engine

I'm trying to get my head around a login model that uses several authentication methods.
1.) For example, when a new user tries to log in with OpenID my backend is going to insert two entities into the datastore:
Insert a new user, where the automatically inserted id will be his $userId
(kind: User, id: autoId)
Insert a new login that is linked to the $userId
(kind: AuthOpenid, name: $openId), Property(userId: $userId)
This will allow me to make lookup by key requests when a user tries to log in, which enforces strongly consistent data, right?
The idea is that one user can have many logins (like stackexchange) and I don't have to worry about write/read limits because no entities have ancestors while still enforcing consistency.
2.) On a related note: Assuming my users are allowed to pick a username once they have provided an authentication method, how do I efficiently check if a username is taken?
My idea was to insert a new entity for every picked username.
Insert a new username
(kind: Username, name: $username)
Now I can simply make a lookup by key request to see if a username is taken. As far as I know, common lookups will be stored in memcache anyways, so this should be efficient, right?
I could also reverse the procedure and just attempt to insert a username and see if it fails.

1) Your approach looks good. As you've noted, Lookup operations (lookup by key) are guaranteed to return consistent results.
You're also correct that by putting each AuthOpenid entity in its own entity group (no common ancestor), you will avoid the write throughput limit of 1 write/second on any particular entity group (there's no corresponding limit on rate of entity group creation).
2) This will also work, but you will need to execute the read and write operations as part of a transaction. This ensures that if two users try to reserve the same username, only one of them will succeed.
In Cloud Datastore, an insert mutation will fail if an entity with the same key already exists, so this will also work.
(Note that this is different from the put() operation in the App Engine Datastore which uses upsert semantics.)

Related

Change appengine ndb key

I have a game where I've (foolishly) made the db key equal to the users login email. I did this several years ago so I've got quite a few users now. Some users have asked to change their email login for my game. Is there a simple way to change the key? As far as I can tell I'd need to make a new entry with the new email and copy all the data across, then delete the old db entry. This is the user model but then I've got other models, like one for each game they are involved in, that store the user key so I'd have to loop though all of them as well and swap out for the new key.
Before I embark on this I wanted to see if anyone else had a better plan. There could be several models storing that old user key so I'm also worried about the process timing out.
It does keep it efficient to pull a db entry as I know the key from their email without doing a search, but it's pretty inflexible in hindsight
This is actually why you don't use a user's email as their key. Use ndb's default randomly generated key ids.
The efficiency you're referring is not having to query the user's email to retrieve the user id. But that only happens once on user login or from your admin screens when looking at someones account.
You should rip the bandade off now and do a schema-migration away from this model.
Create a new user model (i.e. UsersV2) and clone your existing user model into it to generate new ids.
On all models that reference it add a duplicate field user_v2 = ndb.KeyProperty(UsersV2) and populate it with the new key.
Delete the legacy user model
You should use the taskqueue to do something like this and then you won't have to worry about the process timing out:
https://cloud.google.com/appengine/articles/update_schema
Alternatively, if you are determined to do this cascading update everytime a user changes an email, you could set up a similar update_schema task for just that user.
I ended up adding a new property to my user model and running a crawler to copy the string key (the email) to that new property. I changed my code search for that property rather then the key string to get a user item. Most of my users still have keys that equal their email, but I can safely ignore them as if the string is meaningless. I can now change a users email easily without making a new recored and my other models that have pointers to these user keys can remain unchanged.

Appengine ndb - How to ensure unique username and email without ancestors?

In my Appengine (using ndb) application I store users and both username and email need to be unique.
I also need to be able to update progress (save level if higher than previously stored level), change email and pw and delete account.
I noticed that it is not possible to query without ancestors in a transaction. But creating an ancestor is NOT a solution since that would limit the number of writes to 1 per second which is not OK if the app gets popular. So I need another solution.
Is it possible to use the Key? Yes, but that only makes the username unique, how can I make sure noone is reusing the email for another account?
You should be able to use a cross group transaction for this along with an entity that exists solely for reserving email addresses.
For your User entity, you could use the username as the key name. When creating a user, you also create an EmailReservation entity that has the user's email address as a key name.
You then use a cross-group transaction to create a new user:
#ndb.transactional(xg=True)
def create_user(user_name, email):
user = User.get_by_id(user_name)
email_reservation = EmailReservation.get_by_id(email)
if user or email_reservation:
# Either the user_name or email is already in use so stop
return None
# Create the user and reserve the email address so others can't use it
user = User(id=user_name)
email_reservation = EmailReservation(id=email)
ndb.put_multi(user, email_reservation)
return user

How to get "relationships" elements efficiently on datastore?

I'm developing a social login system using google datastore. I need to authenticate the user using its social identity and then return the information of all its identities. The client can login with multiple social accounts and also with an identity created on my site so it basically has multiple social identities plus my site identity. Currently I'm using running 3 queries (sequentially) which I feel it's a bit too much so I'm wondering if there is a better way to do this:
// get the username registered with my site(if is registered)
- userID = SELECT userID From social WHERE socialID == $socialID
// get the data of the user
- userData = SELECT * from MyData WHERE userID == userID
// get the data of any other identity it uses / has linked to the user id
- otherSocial = select * FROM social WHERE userID=userID and socialID != $socialID
You can get userData by its key, which is faster and cheaper than running a query. In order to be able to do that, you should use userId as an id for userData.
Your third query is probably needed only in some rare circumstances, e.g. when a user accesses account settings. In either case, I would not worry too much about these queries: they retrieve a small number of entities, which means that they execute very fast.
You can store some data in a session, so you don't have to retrieve it until the next session. I store a LoginOption entity, which is an equivalent of your userId and socialId. Thus, I can bypass the first query until a user logs out or a session expires.

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

What are the fields that the user table should contain from the security/authenication perspective?

When designing user table what would be the must have fields from the security/user authentication point of view for a Web based Application (.NET and SqlServer 2005)
I came with with the following fields:
userID
username -- preferably email
passwordHash
onceUsePassword -- to indicate that the password should be changed after login
alternativeContactEmail
userStatusID -- FK to a lookup table with statuses like: active, diabled etc
dateCreated
dateUpdated
lastPasswordUpdate
lastLogon
-- and then the rest like :forename, surname etc which are not of the interest in this question
Am I missing something?
Is standard identity (INT) sufficient for userID or should the GUID be used instead (the userID is not going to be exposed anywhere)?
EDIT:
I am limited to the use of .NET 1.1
(don't ask...)
The salt info will be merged with passwordHash
the account would be unlocked by sending a temporary, single use system generated password to the user email address (hence onceUsePassword field)
Why not just use the built-in SQL Membership Provider if you're using SQL Server anyway? It's much better than rolling your own since it's been tested by a lot of people.
In any case, you should think about adding a salt field your table.
Salting
Update:
.NET 1.1? I guess that answers my question. Is your application for the consumption of the general public? If so, you might want to add a way for them to unlock their accounts via a secret question.
onceUsePassword -- to indicate that
the password should be changed after
login
If you have to explain it that much, you should rename it. Something like "forceChangePasswordOnLogin".
You should add a "salt" field to use password salting to avoid dictionary attacks with rainbow tables if your database ever got compromised.
I'm not sure what you mean by "The salt info will be merged with passwordHash". Does that mean that the same salt is used for all password hashs? Would make more sense to generate a random salt for each hash, and store it in a separate field.

Resources