Separating user table from people table in a relational database - database

I've done many web apps where the first thing you do is make a user table with usernames, passwords, names, e-mails and all of the other usual flotsam. My current project presents a situation where non-users records need to function similarly to users, but do not need to the ability to be a first order user.
Is it reasonable to create a second table, people_tb, that is the main relational table and data store, and only use the users_tb for authentication? Does separating user_tb from people_tb present any problems? If this is commonly done, what are some strategies and solutions as well as drawbacks?

This is certainly a good idea, as you are normalizing the database. I have done a similar design in an app that I am writing, where I have an employee table and a user table. Users may a from an external company or an employee, so I have separate tables because an employee is always a user, but a user may not be an employee.
The issues that you'll run into is that whenever you use the user table, you'll nearly always want the person table to get the name or other common attributes you would want to show up.
From a coding standpoint, if you're using straight SQL, it will take a little more effort to mentally parse the select statement. It may be a little more complicated if you're using an ORM library. I don't have enough experience with those.
In my application, I'm writing it in Ruby on Rails, so I'm constantly doing things like employee.user.name, where if I kept them together, it would be just employee.name or user.name.
From a performance standpoint, you are hitting two tables instead of one, but given proper indexes, it should be negligible. If you had an index that contained the primary key and the person name, for instance, the database would hit the user table, then the index for the person table (with a nearly direct hit), so the performance would be nearly the same as having one table.
You could also create a view in the database to keep both tables joined together to give you additional performance enhancements. I know in the later versions of Oracle you can even put an index on a view if needed to increase performance.

I routinely do that because for me the concept of "user" (username, password, create date, last login date) is different from "person" (name, address, phone, email). One of the drawbacks that you may find is that your queries will often require more joins to get the info you're looking for. If all you have is a login name, you'll need to join the "people" table to get the first and last name for example. If you base everything around the user id primary key, this is mitigated a bit, but still pops up.

If user_tb has auth info, I would very much keep it separate from people_tb. I would however keep a relationship between the two, and most of users' info would be stored in people_tb except all of the info needed for auth (which i guess will not be used for much else) Its a nice tradeoff between design and efficiency i think.

That is definitely what we do as we have millions of people records and only thousands of users. We also separate address, phones and emails into relational tables as many people have more than one of each of these things. Critial is to not rely on name as the identifier as name is not unique. Make sure the tables are joined through some type of surrogate key (an integer or a GUID is preferable) not name.

I always try to avoid as much data repetition as possible. If not all people need to login, you can have a generic people table with the information that applies to both people and users (eg. firstname, lastname, etc).
Then for people that login, you can have a users table that has a 1~1 relationship with people. This table can store the username and password.

I'd say go for the normalized design (two tables) and only denormalize (go down to one user/person table) if it will really make your life easier down the line. If however practically all people are also users it may be simpler to denormalize up front. Its up to you; I have used the normalized approach without problems.

Very reasonable.
As an example, take a look at the aspnet_* services tables here.
Their built in schema has a aspnet_Users and aspnet_Membership with the later table having more extended information about a given user (hashed passwords, etc) but the aspnet_User.UserID is used in the other portions of the schema for referential integrity etc.
Bottom line, it's very common, and good design, to have attributes in a separate table if they are different entities, as in your case.

Related

1:1 Relationships. Split into more than 1 table? Bad?

I am creating a mobile game where I am optimistically hoping i'll have millions of players.
I have created a users table that currently has roughly 8 columns (ie. userid, username, password, last_signin, etc)
For every user I'll also need to record the amount of in-game currency they have (ie. gold, silver, gems, etc).
This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table that the users table will be hammered with a crazy amount of queries per second. Everytime someone in the game finds more gold, more silver, logs in, creates an account... the users table will be accessed and/or updated.
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources" that had the following columns : userid, gold, silver, gems. This new table would have the exact same number of rows as the user table since there is a 1:1 relationship between users and resources. I'm wondering if those queries would be faster since the database data is split up and not all queries would go to the same table.
Clearly to me it seems better to put it all in 1 table since they are 1:1.... but It also seemed like a bad idea to have the majority of the games data in 1 table.
Thanks for any advice you can give!
Ryan
There are plenty of cases where good design calls for two tables in a 1:1 relationship with each other. There is no normalization rule that calls for decomposing tables in this manner. But normalization isn't the only handle on good design.
Access traffic is another handle. Your intuition that access to resources is going to be much more frequent than access to basic user data sounds credible. But you will need to check it out, to make sure that the transactions that access resources don't end up using basic user data anyway. It all boils down to which costs more: a fat user table or more joins.
Other responders have already hinted that there may come a day when the 1:1 relationship becomes a 1:many relationship. I can imagine one. The model of the game player gets expanded where a single user can get involved in multiple distinct instances of the game. In this case, a single user might have the same basic user data in all instances, but different resources in each instance. I have no way of telling if this is ever going to happen in your case. But, if it does, you're going to be better off with a separate resources table.
It really depends on your game design, how big your database is, and how you might expand your database in the future. I would put the resources in a separate table with a foreign key pointing to the user id because:
You can keep the user table slimmer for easier
maintenance/backup.
Simple 1-to-1 JOIN operation between two
tables doesn't take much more resources than having everything in
the same table, as long as you have proper indexing.
By keeping your tables separated, you are practicing separation of concerns;
multiple people can work on different stuff without having to worry
about affecting other tables.
Easier to expand. You may want to add other columns such as birth_date, region, first_name, etc. that
are more relevant to users' personal info to the users table in the
future. It will be confusing if columns of different purposes are
stored together. (In PostgreSQL you can't simply arrange column
order though you can create Views for that.)
This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
... for now ;)
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table
New columns?
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources"
Probably, because:
You'll be doing smaller writes when you update the frequently updated part, without rewriting less-modified user data
It makes it easier to audit changes to the user data

SQLite: Individual tables per user or one table for them all?

I've already designed a website which uses an SQLite database. Instead of using one large table, I've designed it so that when a user signs up, a individual table is created for them. Each user will possibly use several hundreds of records. I done this because I thought it would be easier to structure and access.
I found on other questions on this site that one table is better than using many tables for each user.
Would it be worth redesigning my site so that instead of having many tables, there would be one large table? The current method of mine seems to work well though it is still in development so I'm not sure how well it would stack up in a real environment.
The question is: Would changing the code so that there is one large database instead of many individual ones be worth it in terms of performance, efficiency, organisation and space?
SQLite: Creating a user's table.
CREATE TABLE " + name + " (id INTEGER PRIMARY KEY, subject TEXT, topic TEXT, questionNumber INTEGER, question TEXT, answer TEXT, color TEXT)
SQLite: Adding an account to the accounts table.
"INSERT INTO accounts (name, email, password, activated) VALUES (?,?,?,?)", (name, email, password, activated,)
Please note that I'm using python with Flask if it makes any difference.
EDIT
I am also aware that there are questions like this already, however none state whether the advantages or disadvantages will be worth it.
In an object oriented language, would you make a class for every user? Or would you have an instance of a class for each user?
Having one table per user is a really bad design.
You can't search messages based on any field that isn't the username. With your current solution, how would you find all messages for a certain questionNumber?
You can't join with the messages tables. You have to make two queries, one to find the table name and one to actually query the table, which requires two round-trips to the database server.
Each user now has their own table schema. On an upgrade, you have to apply your schema migration to every messages table, and God help you if some of the tables are inconsistent with the rest.
It's effectively impossible to have foreign keys pointing to your messages table. You can't specify the table that the foreign key column points to, because it won't be the same.
You can have name conflicts with your current setup. What if someone registers with the username accounts? Admittedly, this is easy to fix by adding a user_ prefix, but still something to keep in mind.
SQL injection vulnerabilities. What if I register a user named lol; DROP TABLE accounts; --? Query parameters, the primary way of preventing such attacks, don't work on table names.
I could go on.
Please merge all of the tables, and read up on database normalization.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

Creating several database tables for user data?

I need to have a lot of user data in the database. Now, I've been thinking about having two tables, users that would have only the id, username and password and another table userData that would have everything else like name, lastname etc.
Is this a prefered method?
The simplest design would put all the fields in one table. From that point, though, there are a bunch of reasons you might want to consider splitting that information up into multiple tables. From your description, I cant' tell whether there are any valid reasons to do so.
If you start with one table, you might find it advantageous to split the data for reasons such as:
Normalization.
Reducing contention (different parts of the app update different information)
Truly huge column lists (look into the limit for your DB)
Other?? (how you're going to maintain your app, maybe?)
In short, I'd try to start simple and have a reason to pick the more complex design if you go that route.
There is nothing wrong with that design IMHO. You can have a users table and link it to a users_custom table that has additional information. Just be consistant with your design. Just remember that in order to get any additional user information you will always have to JOIN to that data.
To me this is a matter of preference, if you feel that this table will grow over time, consider your design, if not just keep it all in one table and properly index columns that you deem necessary.
You can go further by having a UserLog table to build a historical view of values as they change.
Yes it is :) In theory there are this so called "normal forms" (3NF BCnF, etc...). Using them, means seperating table into smaller ones :)
I think it might be better for you to keep it all in one table. Assuming you will be enforcing unique usernames, all the fields (password, first_name, and last_name) have a functional dependency on username. Therefore, you can put them all in the same table and still have a normalized database.
Although you can certainly separate first_name and last_name into their own table, queries will get a lot easier (fewer JOINs) if you keep all those fields in one table.

Resources