Which Property type do I use to put a User's user_id() in the Datastore? - google-app-engine

This is why I need to put user_id()'s in the Datastore:
A User value in the datastore does not
get updated if the user changes her
email address. This may be remedied in
a future release. Until then, you can
use the User value's user_id() as the
user's stable unique identifier.
http://code.google.com/appengine/docs/python/datastore/typesandpropertyclasses.html#users_User
Within the datastore, the value is
equal to the email address plus the
user's unique ID. If the user changes
her email address, the new User value
will not equal the original User value
in datastore queries or when compared
by the app. If your app needs a stable
identifier that does not change, you
can store the unique ID separately
from the User value.
http://code.google.com/appengine/docs/python/users/userobjects.html
How do I do it?
And how did you know (or where did you look it up?)

The usual answer is "no property": If you need a stable identifier, it's probably because you're looking users up by it, in which case you should use it as the key name for the entity, like so:
class UserInfo(db.Model):
is_admin = db.BooleanProperty(required=True, default=False)
user_info = UserInfo.get_or_insert(users.get_current_user().user_id())
If you really need to store an identifier for the user elsewhere, you can use a db.StringProperty to store the User ID.

Related

Change appengine ndb key

I have a game where I've (foolishly) made the db key equal to the users login email. I did this several years ago so I've got quite a few users now. Some users have asked to change their email login for my game. Is there a simple way to change the key? As far as I can tell I'd need to make a new entry with the new email and copy all the data across, then delete the old db entry. This is the user model but then I've got other models, like one for each game they are involved in, that store the user key so I'd have to loop though all of them as well and swap out for the new key.
Before I embark on this I wanted to see if anyone else had a better plan. There could be several models storing that old user key so I'm also worried about the process timing out.
It does keep it efficient to pull a db entry as I know the key from their email without doing a search, but it's pretty inflexible in hindsight
This is actually why you don't use a user's email as their key. Use ndb's default randomly generated key ids.
The efficiency you're referring is not having to query the user's email to retrieve the user id. But that only happens once on user login or from your admin screens when looking at someones account.
You should rip the bandade off now and do a schema-migration away from this model.
Create a new user model (i.e. UsersV2) and clone your existing user model into it to generate new ids.
On all models that reference it add a duplicate field user_v2 = ndb.KeyProperty(UsersV2) and populate it with the new key.
Delete the legacy user model
You should use the taskqueue to do something like this and then you won't have to worry about the process timing out:
https://cloud.google.com/appengine/articles/update_schema
Alternatively, if you are determined to do this cascading update everytime a user changes an email, you could set up a similar update_schema task for just that user.
I ended up adding a new property to my user model and running a crawler to copy the string key (the email) to that new property. I changed my code search for that property rather then the key string to get a user item. Most of my users still have keys that equal their email, but I can safely ignore them as if the string is meaningless. I can now change a users email easily without making a new recored and my other models that have pointers to these user keys can remain unchanged.

Appengine ndb - How to ensure unique username and email without ancestors?

In my Appengine (using ndb) application I store users and both username and email need to be unique.
I also need to be able to update progress (save level if higher than previously stored level), change email and pw and delete account.
I noticed that it is not possible to query without ancestors in a transaction. But creating an ancestor is NOT a solution since that would limit the number of writes to 1 per second which is not OK if the app gets popular. So I need another solution.
Is it possible to use the Key? Yes, but that only makes the username unique, how can I make sure noone is reusing the email for another account?
You should be able to use a cross group transaction for this along with an entity that exists solely for reserving email addresses.
For your User entity, you could use the username as the key name. When creating a user, you also create an EmailReservation entity that has the user's email address as a key name.
You then use a cross-group transaction to create a new user:
#ndb.transactional(xg=True)
def create_user(user_name, email):
user = User.get_by_id(user_name)
email_reservation = EmailReservation.get_by_id(email)
if user or email_reservation:
# Either the user_name or email is already in use so stop
return None
# Create the user and reserve the email address so others can't use it
user = User(id=user_name)
email_reservation = EmailReservation(id=email)
ndb.put_multi(user, email_reservation)
return user

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

Is Google Account user_id() unique for all time? (i.e. never re-used)

Is user_id() unique for all time?
That is, even if a Google Account is deleted, the user_id() of that
deleted account will never appear again in any other user_id, right?
We are clarifying our understanding of this statement from
http://code.google.com/appengine/docs/python/users/userclass.html#User_user_id
user_id()
If the email address is associated with a Google account, user_id
returns the unique permanent ID of the user, a string. This ID is
always the same for the user regardless of whether the user changes
her email address.
Yes it is. It is a String of digits that is immutable to your email changes etc. Other properties such as email, nickname and other properties are mutable.
EDIT
It is unique!
From https://developers.google.com/appengine/docs/python/users/userobjects
The User object for a valid user can provide a unique ID value for the
user that stays the same even if the user changes her email address.
The user_id() method returns this ID, a str value.
The User object has the same form no matter which method of
authentication your app uses. If you switch authentication options
from Google Accounts to OpenID, existing User objects in the datastore
are still valid.
Also take care beacuse the UserProperty is mutable as well as discussed.

Is there any way to have key_name and id at the same time for GAE datastore entity?

In addition to the key_name I generate, I also would like to have some other property, which will act as id (I don't want to show key_name to the user). Can it be id? Or, how to generate unique value instead of id?
What I will do - I will generate a url with usage of that id and parent key name. If user clicks on this link, I'll need to find this datastore entity and update it.
Here is the current code.
Creation of the record:
item = Items.get_by_key_name(key_names=user_id, parent=person)
if item is None:
item = Items(key_name='id'+user_id, parent=person)
Getting the record:
item = Items.get_by_key_name(key_names=user_id, parent=person)
user_id is what should be hidden.
I could be probably wrong because your requirements are not clear, but for me you should pass just the key to the view using:
item.key()
then you could pass back the key to the controller and easily retrieve a given entity with:
item = Items.get(key)
Entities have exactly one of a key name or ID - never both. You could create an entity with a single ReferenceProperty pointing to your target entity, and use its ID as an identifier, but there really should be no reason not to reveal a key name to a user - a well authored app should not rely on this value remaining secret.
Note that it's trivially easy to extract the key name (and the rest of the information about a key) from the string encoded key.

Resources