Separating collections in mongodb database that share a 1-to-1 relationship - database

I am using mongodb as the database for a project I've been working on and in the database I have a "user" collection and an "account" collection. Every user has one account and every account has a "user" field that is the _id of the corresponding user. The reason I separated these into two collections is because I thought it made sense to keep the user's sensitive data (password, email, legal name, etc.) separate from the account data (things like interests, followers, username, etc.). Also the account collection has a lot of fields so it just seemed easier to not over-saturate the "user" collection with data.
So, my question is - Now that I essentially have 2 collections pointing to the same user, should I use the "user._id" to query both users and accounts? Since each account has a unique "user" field, is there a reason to query those accounts with their own _id property? It seems odd to keep track of two different _id's on the frontend and conditionally send either the user._id or account._id.
The two main drawbacks I have found when using the user._id to query both users and accounts is:
When querying account data, I have to almost always make sure I send the "user" field so I have that id on the front end.
If in the future, I wanted to add the ability for users to create multiple accounts, I would have to change the code to now fetch account data using the "account._id".
Hopefully that all makes sense, and maybe it doesn't even make sense for me to separate those collections. Thank you to anyone who can help!

Related

In MongoDB, should I store the user id in the _id field or in id field created by myself?

Each user has a unique ID and an email and password. Of course, email is also unique.
Currently, I have a id field in my mongoDB collection of user. For example, below is a document of a user.
{
_id: objectID(12345678),
id: 1,
email: hello#hello.com,
password: xxxxxxxxxx
}
I am thinking to get rid of the id field, and use the _id to store my user id. For example:
{
_id: 1,
email: hello#hello.com,
password: xxxxxxxxxx
}
Is that a good idea? Any danger of doing so? I think this is going to reduce the complexity and redundancy.
This is a generic topic not only related to MongoDB. Google for "surrogate key vs natural key" or "surrogate key vs primary key", you will find many articles with pros and cons for both sides.
You say "email is always unique" - is this really true? Also in future when your application evolves?
In your case I would suggest to use a dedicated field instead of the _id field. Think about potential situations now and in the future:
Some client drivers may handle _id field in a special way. Maybe they suppress it or run other "stupid" stuff, e.g. execute _id.getTimestamp() internally and fail.
A user may request to change his email. The _id cannot be modified!
You may have the requirement to merge several accounts into a new one
You may need more than one email for an account (e.g. for password verification/reset)
What happens when an account is deleted? For regulatory or other reason you may not delete the data physically from your database, you just want to lock the account. But then the email is still blocked for other uses.
Yes, what you think is right. You should use the automatically generated _id from MongoDB to identify the user uniquely, there is no need for writing a separate id of your own.
Two advantages :
Auto-generated ids are always unique, so there is no problem of redundancy(repetition)
It is easy to maintain.

Query MongoDB to find array that contains elements from another array

For my users table, each user has an emails<Array> property allowing them to associate multiple emails with their accounts. I want to make sure this is still unique so wether they're creating a new account or updating existing I need to query the DB to identify if that email address exists.
I know I can use users.find({ emails: email }) and could loop over that to identify, then check the _id (on update) to ensure everything but that puts me in a loop outside the query.
I'm curious if there's a method I'm not seeing for querying to identify if any emails being submitted match any emails from across the table in the db?

Sharded ancestor entities in GAE

I'm working on a GAE-based project involving a large user base (possibly millions of users). We use Datastore for persistency. Users will be identified both by username and by e-mail address, so these two properties should be unique across all entities of the kind. Because Datastore doesn't support unique fields other than ID, we need transactions to ensure uniqueness of these fields when new users are registered. And in order to have transactions, User entities need to be enclosed in entity groups.
Having large entity groups is not recommended, as pointed out here. Therefore, given a possible large number of stored users, I'm thinking of putting them into multiple smaller entity groups. Each group would have a common parent with ID generated from the two unique fields (a piece of the MD5 sum for instance). Inserting a new user could look like this (in Python):
#ndb.transactional
def register_new_user(login, email, full_name) :
# validation code omitted
user = User(login = login, email = email, full_name = full_name)
group_id = a_simple_hash(login, email)
group_key = ndb.Key('UserGroup', group_id)
query = User.query(ancestor = group_key).filter(ndb.OR(User.login = login, User.email = email))
if not query.get() :
user.put()
One problem I see with this solution is that it will be impossible to get a User by ID alone. We'd have to use complete entity keys.
Are there any other cons of such approach? Anyone tried something similar?
EDIT
As I've been pointed out in comments, a hash like the one outlined above would not work properly because it would only prevent registering users having non-unique e-mails together with non-unique usernames matching those e-mails. It would work if the hash was computed based on a single field.
Nevertheless, I find the concept of such sharding interesting by itself and perhaps worth of discussion.
An e-mail address is owned by a user and unique. So there is a very small change, somebody will (try to) use the same email address.
So my approch would be: get_or_insert a new login, which makes it easy to login (by key) and next verify if the e-mail address is unique.
If it not unique you can discard or .....do something else
Entity groups have meaning for transactions. I'am interested in your planned transactions, because I do not understand your entity group key hash. Which entities will be part of the entity group, and why?
A user with the same login will be part of another entity group, If i do understand your hash?
It looks like your entity group holds a single entity.
In my opinion you're overthinking here : what's the probability of having two users register with the same username at the same time ?
Very slim. Eventual consistency is good enough for this case, as you don't nanosecond precision...
unless you plan to have more users than facebook, with people registering every second.
Registering with the same email is virtually impossible for different users, since the check has already been done by the email provider for you!
Only a user could try to open two accounts with the same email address. Eventual consistency is good enough for this query too.
Your user entities each belong to their own entity group.
Actually in most use cases, your User is the most obvious root entity : people use the datastore because they need scalability, and most of the time huge scale is needed for user oriented apps.

How to prevent user to access other users' data?

PROBLEM
User authenticated into the application
Simple database schema: User ---> Document ---> Item
API to access to Document Items
If the logged user knows the id of items that belong to some other user, he can access to it.
I would like to prevent this behavior.
SOLUTION
The first solution I found is to add a userid field to every records in every table to check at every query if the record belong to the logged user.
This is a good solution? Do you know some better design pattern to prevent the user to access other users' data?
Thanks
If the documents belong to a user, adjust your queries so that only items that belong to the user's documents are retrieved. No need to add userIDs to the items themselves.
If you need to expose IDs to the users, make those IDs GUIDs, instead of consecutive numbers. While not a perfect solution, it makes it much harder to guess the IDs of other users' items,
If you're using Oracle, there's VPD, Virtual Private Database. You can use that to restrict access for users.

Model agency database opinion

I would like to make a model agency based on codeigniter, but im a but stuck with the database, exactly the registration part.
I would like to allow users to sign up as, model, photohgrapher, agency, or make-up artist.
So could someoone give me an opinion how to make the database? Like seperate the models, photographers, agencies, and artists in diferent tables, and at the registration form only ask for baseic info? like name, password, email, D.O.B., or there is a nother way?.
Thank you
You should use entity sub-typing with a parent type of "USER", which will contain your basic information, and with sub-types of "MODEL", "AGENCY", "PHOTOGRAPHER", "MAKEUP_ARTIST". This will allow you to have a better user experience for the inevitable case where there is overlap. I'm sure there are photographers who have agencies and agencies that do make-up etc. It would be much better for these types of users to have a single user ID and password despite having different types of profiles.
Make a drop down for different type of people signing up which the data for drop down comes from a separate table (e.g. person_type) from database and save the basic details of the person in separate table with the ID of the person_type table.
You can make a model for getting, inserting and updating records for this purpose.

Resources