Change appengine ndb key - google-app-engine

I have a game where I've (foolishly) made the db key equal to the users login email. I did this several years ago so I've got quite a few users now. Some users have asked to change their email login for my game. Is there a simple way to change the key? As far as I can tell I'd need to make a new entry with the new email and copy all the data across, then delete the old db entry. This is the user model but then I've got other models, like one for each game they are involved in, that store the user key so I'd have to loop though all of them as well and swap out for the new key.
Before I embark on this I wanted to see if anyone else had a better plan. There could be several models storing that old user key so I'm also worried about the process timing out.
It does keep it efficient to pull a db entry as I know the key from their email without doing a search, but it's pretty inflexible in hindsight

This is actually why you don't use a user's email as their key. Use ndb's default randomly generated key ids.
The efficiency you're referring is not having to query the user's email to retrieve the user id. But that only happens once on user login or from your admin screens when looking at someones account.
You should rip the bandade off now and do a schema-migration away from this model.
Create a new user model (i.e. UsersV2) and clone your existing user model into it to generate new ids.
On all models that reference it add a duplicate field user_v2 = ndb.KeyProperty(UsersV2) and populate it with the new key.
Delete the legacy user model
You should use the taskqueue to do something like this and then you won't have to worry about the process timing out:
https://cloud.google.com/appengine/articles/update_schema
Alternatively, if you are determined to do this cascading update everytime a user changes an email, you could set up a similar update_schema task for just that user.

I ended up adding a new property to my user model and running a crawler to copy the string key (the email) to that new property. I changed my code search for that property rather then the key string to get a user item. Most of my users still have keys that equal their email, but I can safely ignore them as if the string is meaningless. I can now change a users email easily without making a new recored and my other models that have pointers to these user keys can remain unchanged.

Related

Separating collections in mongodb database that share a 1-to-1 relationship

I am using mongodb as the database for a project I've been working on and in the database I have a "user" collection and an "account" collection. Every user has one account and every account has a "user" field that is the _id of the corresponding user. The reason I separated these into two collections is because I thought it made sense to keep the user's sensitive data (password, email, legal name, etc.) separate from the account data (things like interests, followers, username, etc.). Also the account collection has a lot of fields so it just seemed easier to not over-saturate the "user" collection with data.
So, my question is - Now that I essentially have 2 collections pointing to the same user, should I use the "user._id" to query both users and accounts? Since each account has a unique "user" field, is there a reason to query those accounts with their own _id property? It seems odd to keep track of two different _id's on the frontend and conditionally send either the user._id or account._id.
The two main drawbacks I have found when using the user._id to query both users and accounts is:
When querying account data, I have to almost always make sure I send the "user" field so I have that id on the front end.
If in the future, I wanted to add the ability for users to create multiple accounts, I would have to change the code to now fetch account data using the "account._id".
Hopefully that all makes sense, and maybe it doesn't even make sense for me to separate those collections. Thank you to anyone who can help!

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

How to prevent user to access other users' data?

PROBLEM
User authenticated into the application
Simple database schema: User ---> Document ---> Item
API to access to Document Items
If the logged user knows the id of items that belong to some other user, he can access to it.
I would like to prevent this behavior.
SOLUTION
The first solution I found is to add a userid field to every records in every table to check at every query if the record belong to the logged user.
This is a good solution? Do you know some better design pattern to prevent the user to access other users' data?
Thanks
If the documents belong to a user, adjust your queries so that only items that belong to the user's documents are retrieved. No need to add userIDs to the items themselves.
If you need to expose IDs to the users, make those IDs GUIDs, instead of consecutive numbers. While not a perfect solution, it makes it much harder to guess the IDs of other users' items,
If you're using Oracle, there's VPD, Virtual Private Database. You can use that to restrict access for users.

Yii Check if entry exists in database before creating new one

I'm pretty new to yii.
I have a form that creates and saves multiple AR models to the database. The problem is that in many cases new entries are actually identical to existing ones. I would like to reduce/ eliminate this kind data redundancy.
The form has 3 entities:
- the main model
- client model
- product model
Many times, product and client will already exist in the database.
Product and Client are referenced through foreign keys in the main model.
I want to know how would it be possible to do the following:
as I type a client's name or phone number, yii searches in the client table and display results as suggestions, through ajax.
if I select one of the suggestions, the Client AR should be populated with that database entry.
when the form is submitted:
if an existing client was selected, use that client's id inside the main model. Do not create a duplicate client in the database.
if client wasn't found in the existing records, create a new one with the provided form data.
I apologize for the bad formatting, this is my second time posting a question. If I wasn't very clear in what I am looking for, please ask for clarification. This is something I would really like to learn.

Database design for opt-in emails

I have a table of users in SQL Server with all the contact details, personal details etc. When each user signs up to my website they will be given the option to opt-in to 5 different types of emails like:
I wish to receive emails about new things
I wish to receive the monthly newsletter
etc etc. I am trying to decide the best way to store this information in a database. My current thinking is to have a seperate table with 5 columns (one for each opt-in) and the value being a bool/bit value.
Since the information wont be required regularly, it will only be required when we want to send mail to user. Are there any better ways / best practices for doing something like this?
The problem with your proposed design is that it becomes difficult to add new email types in the future; you only have 5 now, but what happens when you add a sixth or seventh?.
Instead, I would propose something like:
User Table:
UserID (Primary Key)
User Attributes
EmailTemplate Table
EmailTemplateID (Primary key)
Email Template Attributes
UserEmailTemplates
UserID
EmailTemplateID
You can easily add new templates, and associate them with users.

Resources