When create a user registration system, I'll be using user's email as the username.
When creating the database schema, should I then treat them as 2 separate fields or should I just treat them as 1?
eg.
USER_TABLE {USER_ID, USERNAME, FNAME, LNAME, EMAIL}
or
USER_TABLE {USER_ID, USERNAME, FNAME, LNAME}
I would think the only argument to store 2 fields separately (even when they are the same) is for some kind of future-proofing if we ever decide to let user create a username that is not an email?
Thoughts?
I would avoid premature optimization and use only one field. If you ever need to have 2 fields, it's easy to create and populate one.
If you believe that there is a reasonable possibility that you might want to allow users to start creating user names that aren't email addresses, then keeping separate email and user_id columns is a good idea. This is especially true if you are building a new system.
I have a maxim in system design: "If someone has thought of the idea, it will eventually happen."
By this I mean that it is often a mistake to think "our business rules will never change in area X". Business rules change - a lot.
You can always add a new field to your table later on to distinguish email from user_id. Adding a column is easy. What will be much harder will be changing all of the code you've written that uses the original user_id column for email purposes. This is why I say it's a good idea to build the distinction between user_id and email into your code from the outset.
You could include both attributes but add a constraint ("check" constraint) to guarantee that username and email are the same. Business logic that requires user name or email can then be written against the appropriate attribute and if and when you need to make them independent you can just drop the constraint.
Don't forget the uniqueness constraint for user name and/or email.
Related
I've read here that using email address field as a primary key for managing user database is a very bad idea.
How, and why? The book doesn't delve into the reasons.
How can using email field as a primary key for a table be so deleterious?
Are there some horrible long-term implications that I do not see?
Edit:
This question is about performance issues of string comparison, however, that does not concern me (at least for this question). I am interested in long-term implications of using email as a primary key.
From experience, does it generally cause problems in the future?
Well, I guess the most obvious (not performance-related) reason is that users may want (or need) to change their email addresses.
If the email address is the primary identifier for user accounts this can get confusing pretty quickly.
From a domain modeling view, email-addresses are commonly handled as attributes of persons/users, just as a user name is. While user name changes can probably be reasonably not allowed, email addresses are rather likely to change at some point (user loses access to the account, the organization that maintained the account retires, etc.).
Also, an email address does not need to be eternally assigned to the same real-life person. joe#example.com could be owned by "Joe Miller" in 2005, "Joe Carlos" in 2013, and by "Joeberto Joeman" from 2020 onwards.
This possible need for change is IMO the main reason why email addresses don't make good primary keys.
There are a few attributes you look for in a primary key.
The problems with "email address" are
it's not possible to guarantee it's unique - an email address may be used by a group of people at the same time, or different people over time.
it's not immutable - the same person may change emails over time; this would require you to update all the tables with foreign key relationships
it does not uniquely identify a person - one person may have multiple email addresses
So in one of my school projects, we have to create a database for a pseudo e-commerce website. The project instructions asks us to implement our database with Boyce–Codd normal form, but I think there're some ambiguity about this normal form.
Let's say that we implement the entity Users like that :
Users(*email, username, password, some_other_fields)
(note: * meens primary key)
First of all, if I understood well the BCNF, this entity isn't BCNF. If the usernames are unique as well as emails, then we can also define this entity like this :
Users(*username, email, password, some_other_fields)
My first question is how to create this entity in Boyce–Codd normal form ?
Then I have another issue with this BCNF form : the missing id. Assuming an user can change his username and his email. The primary key will also be change. My issue is that I don't really have a temporal constant that define an element in my entity. This implies, for example, some issues about logging : assuming we log action from an user with the primary key, if foo#smth.com change his email to foo2#smth.com, we can have this kind of logs :
[foo#smth.com] : action xxx
[foo#smth.com] : action yyy
[foo2#smth.com] : action zzz
Then if we don't catch the email change, all our precedent logs means nothing : we don't know who is foo#smth.com.
Then, my second question is don't you think that using a temporal constant id (an integer for example) is more secure ?
Uniqueness is not enough for BCNF. BCNF stresses on Functional Dependency. That is, whether attributes are dependent on the key functionally.
In this case attributes cannot depend on the email. Emails can be changed, inactive, reclaimed by someone else. Therefore, being unique does not justify it enough to be a candidate key. Username may have a higher dependability if functionality restricts the user name to get changed.
Functional Dependency inherently depends on Functional Design. If the application you are creating the table for assumes that usernames will never be allowed to change, then the attributes can depend on username to be a candidate key. If the functional design does allow the username to be changed, then you need to introduce or combine a key that is both unique and functionally dependable.
In case of introduced additional unique ids, they are not 'inherently ' more 'secure' than username here. But they 'feel' or 'become' secure, because presumably the functionality and functional design do not expect the id to be changed. Again, if your functional design allows that id to be changed, then that will not remain secure. Eventually it all depends on your functionality, requirements, and how your attributes are expected to behave according to that functional spec.
If you must have to consider introducing an ID, being not satisfied with dependability of username, then instead of an int/integer, consider rather a GUID, for many many reasons such as the following:
int/interger are typically periodic, that is, they recycle after a limit of given platform, for example for 16 bit ints limit is 32767 to -32768. GUIDs may reappear too, but the chance is statistically much less significant.
for operations that take place at different subsystems and need to be synched later, non-unique ids may get created. Consider two shops of a chain that can register customer in offline mode, and later synches up at cloud. First store creates customer with an ID of say 3000, and second store does the same. When their data synch, you have to use a different composite key structure to accommodate that. GUIDs having higher chances to be unique, can solve them.
Currently my users table has the below fields
Username
Password
Name
Surname
City
Address
Country
Region
TelNo
MobNo
Email
MembershipExpiry
NoOfMembers
DOB
Gender
Blocked
UserAttempts
BlockTime
Disabled
I'm not sure if I should put the address fields in another table. I have heard that I will be breaking 3NF if I don't although I can't understand why. Can someone please explain?
There are several points that are definitely not 3NF; and some questionable ones in addition:
Could there could be multiple addresses per user?
Is an address optional or mandatory?
Does the information in City, Country, Region duplicate that in Address?
Could a user have multiple TelNos?
Is a TelNo optional or mandatory?
Could a user have multiple MobNos?
Is a MobNo optional or mandatory?
Could a user have multiple Emails?
Is an Email optional or mandatory?
Is NoOfMembers calculated from the count of users?
Can there be more than one UserAttempts?
Can there be more than one BlockTime per user?
If the answer to any of these questions is yes, then it indicates a problem with 3NF in that area. The reason for 3NF is to remove duplication of data; to ensure that updates, insertions and deletions leave the data in consistent form; and to minimise the storage of data - in particular there is no need to store data as "not yet known/unknown/null".
In addition to the questions asked here, there is also the question of what constitutes the primary key for your table - I would guess it is something to do with user, but name and the other information you give is unlikely to be unique, so will not suffice as a PK. (If you think name plus surname is unique are you suggesting that you will never have more than one John Smith?)
EDIT:
In the light of further information that some fields are optional, I would suggest that you separate out the optional fields into different tables, and establish 1-1 links between the new tables and the user table. This link would be established by creating a foreign key in the new table referring to the primary key of the user table. As you say none of the fields can have multiple values then they are unlikely to give you problems at present. If however any of these change, then not splitting them out will give you problems in upgrading the application and the data to support the application. You still need to address the primary key issue.
As long as every user has one address and every address belongs to one user, they should go in the same table (a 1-to-1 relationship). However, if users aren't required to enter addresses (an optional relationship) a separate table would be appropriate. Also, in the odd case that many users share the same address (e.g. they're convicts in the same prison), you have a 1-to-many relationship, in which case a separate table would be the way to go. EDIT: And yes, as someone pointed out in the comments, if users have multiple address (a 1-to-many the other way around), there should also be separate tables.
Just as point that I think might help someone in this question, I once had a situation where I put addresses right in the user/site/company/etc tables because I thought, why would I ever need more than one address for them? Then after we completed everything it was brought to my attention by a different department that we needed the possibility of recording both a shipping address and a billing address.
The moral of the story is, this is a frequent requirement, so if you think you ever might want to record shipping and billing addresses, or can think of any other type of address you might want to record for a user, go ahead and put it in a separate table.
In today's age, I think phone numbers are a no brainer as well to be stored in a separate table. Everyone has mobile numbers, home numbers, work numbers, fax numbers, etc., and even if you only plan on asking for one, people will still put two in the field and separate them by a semi-colon (trust me). Just something else to consider in your database design.
the point is that if you imagine to have two addresses for the same user in the future, you should split now and have an address table with a FK pointing back to the users table.
P.S. Your table is missing an identity to be used as PK, something like Id or UserId or DataId, call it the way you want...
By adding them to separate table, you will have a easier time expanding your application if you decide to later. I generally have a simple user table with user_id or id, user_name, first_name, last_name, password, created_at & updated_at. I then have a profile table with the other info.
Its really all preference though.
You should never group two different types of data in a single table, period. The reason is if your application is intended to be used in production, sooner or later different use-cases will come which will need you to higher normalised table structure.
My recommendation - Adhere to SOLID principles even in DB design.
I have a web app where I register users based on their email id.
From a design/ease of use/flexibility point of view, should I assign a unique number to each user or identify user based on emailid?
Advantage of assigning unique number:
I can change the login itself at a later point without losing the data of the user(flexible).
Disadvantage:
I have to deal with numbers when using the sql command line(error prone).
Which is better? Do you see any other issues that need to be considered for either scheme?
The identity of your users should be unique and immutable. Choosing the email address as identity is not a good idea for several reasons:
The email is one facet of the user's identity that can change at any point in time.
You might decide to allow more than one emails.
You might decide to add other facets, like OpenID or Live ID, or even just old plain username.
There's nothing wrong with allowing multiple identityies to share the same email facet. It is a rare scenario, but not unheard of.
Normalizing the email address is hard and error prone, so you might have problems enforcing the uniqueness. (Are email addresses case sensitive? Do you ignore . or + inside emails? How do you compare non-english emails?)
Btw, using the email as a public representation of the user identity can be a security and privacy problem. Especially if some of your users are under 13 years. You will need a different public facet for the user identity.
Use both.
You have to add an id because you really don't want other tables to use the email address as a foreign key.
Make the email address unique so that you can still use it to identify a user with sql command line.
Unique number - ALWAYS!
But keep the number hidden from the user.
The user should be allowed to change their email. If this is used as the primary identifier then it can cause lots of complications when the key is used in multiple tables.
You should have another identifier other then the users email address which is not visible to the user and never changes. You should then enforce uniqueness on the email address so it can be used as a candidate key.
You will find that users will want to change their email address, or anything really which they can see, so you should as good practice have an identifier which cannot be changed.
Dealing with numbers in sql command object would not really be any more error prone then using the actual email address, if anything I would think it would be less error prone.
Your disadvantage is not a disadvantage. Using numbers with sql is not more or less a problem than using emails or anything else for the matter.
On the other hand your advantage is quite a strong one, you might want to associate users with each other, different emails with one user account, etc. and always using the email will make things harder.
Think also of urls including user identication, an ID is much easier to handle there than an email where you have to think about the proper url endocing.
So in favour of flexiblity and ease of use, I would strongly recommend a unique userID.
Just some points to consider.
How will you validate the email address?
How do you ensure that it is really unique (I don't always use my real address e.g. m.mouse at disney.com
I like to use a unique key generated by the database to identify the record and then add attributes which are out of my control separately
A person's email can change but the id will not
Unique numbers. As well as the reasons identified, I think it would be less error prone than using an email address. Shorter, no funny characters, easier to validate, etc.
When designing user table what would be the must have fields from the security/user authentication point of view for a Web based Application (.NET and SqlServer 2005)
I came with with the following fields:
userID
username -- preferably email
passwordHash
onceUsePassword -- to indicate that the password should be changed after login
alternativeContactEmail
userStatusID -- FK to a lookup table with statuses like: active, diabled etc
dateCreated
dateUpdated
lastPasswordUpdate
lastLogon
-- and then the rest like :forename, surname etc which are not of the interest in this question
Am I missing something?
Is standard identity (INT) sufficient for userID or should the GUID be used instead (the userID is not going to be exposed anywhere)?
EDIT:
I am limited to the use of .NET 1.1
(don't ask...)
The salt info will be merged with passwordHash
the account would be unlocked by sending a temporary, single use system generated password to the user email address (hence onceUsePassword field)
Why not just use the built-in SQL Membership Provider if you're using SQL Server anyway? It's much better than rolling your own since it's been tested by a lot of people.
In any case, you should think about adding a salt field your table.
Salting
Update:
.NET 1.1? I guess that answers my question. Is your application for the consumption of the general public? If so, you might want to add a way for them to unlock their accounts via a secret question.
onceUsePassword -- to indicate that
the password should be changed after
login
If you have to explain it that much, you should rename it. Something like "forceChangePasswordOnLogin".
You should add a "salt" field to use password salting to avoid dictionary attacks with rainbow tables if your database ever got compromised.
I'm not sure what you mean by "The salt info will be merged with passwordHash". Does that mean that the same salt is used for all password hashs? Would make more sense to generate a random salt for each hash, and store it in a separate field.