1:1 Relationships. Split into more than 1 table? Bad? - database

I am creating a mobile game where I am optimistically hoping i'll have millions of players.
I have created a users table that currently has roughly 8 columns (ie. userid, username, password, last_signin, etc)
For every user I'll also need to record the amount of in-game currency they have (ie. gold, silver, gems, etc).
This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table that the users table will be hammered with a crazy amount of queries per second. Everytime someone in the game finds more gold, more silver, logs in, creates an account... the users table will be accessed and/or updated.
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources" that had the following columns : userid, gold, silver, gems. This new table would have the exact same number of rows as the user table since there is a 1:1 relationship between users and resources. I'm wondering if those queries would be faster since the database data is split up and not all queries would go to the same table.
Clearly to me it seems better to put it all in 1 table since they are 1:1.... but It also seemed like a bad idea to have the majority of the games data in 1 table.
Thanks for any advice you can give!
Ryan

There are plenty of cases where good design calls for two tables in a 1:1 relationship with each other. There is no normalization rule that calls for decomposing tables in this manner. But normalization isn't the only handle on good design.
Access traffic is another handle. Your intuition that access to resources is going to be much more frequent than access to basic user data sounds credible. But you will need to check it out, to make sure that the transactions that access resources don't end up using basic user data anyway. It all boils down to which costs more: a fat user table or more joins.
Other responders have already hinted that there may come a day when the 1:1 relationship becomes a 1:many relationship. I can imagine one. The model of the game player gets expanded where a single user can get involved in multiple distinct instances of the game. In this case, a single user might have the same basic user data in all instances, but different resources in each instance. I have no way of telling if this is ever going to happen in your case. But, if it does, you're going to be better off with a separate resources table.

It really depends on your game design, how big your database is, and how you might expand your database in the future. I would put the resources in a separate table with a foreign key pointing to the user id because:
You can keep the user table slimmer for easier
maintenance/backup.
Simple 1-to-1 JOIN operation between two
tables doesn't take much more resources than having everything in
the same table, as long as you have proper indexing.
By keeping your tables separated, you are practicing separation of concerns;
multiple people can work on different stuff without having to worry
about affecting other tables.
Easier to expand. You may want to add other columns such as birth_date, region, first_name, etc. that
are more relevant to users' personal info to the users table in the
future. It will be confusing if columns of different purposes are
stored together. (In PostgreSQL you can't simply arrange column
order though you can create Views for that.)

This is a 1:1 relationship (a user will only ever have 1 value defining how much gold they have).
... for now ;)
I am no database expert (which is why I am posting here). I worry If I added the gold, silver, gems, etc as new rows in the users table
New columns?
Would it be smarter to add the gold, silver, and gems as columns in a new table called "resources"
Probably, because:
You'll be doing smaller writes when you update the frequently updated part, without rewriting less-modified user data
It makes it easier to audit changes to the user data

Related

Database design, multiple M-M tables or just one?

Today I was designing a database for a potential personal project of mine. Since I couldn't decide what would be a better option I asked my teacher Databases, unfortunately he couldn't tell me which of the two options is better than the other and why.
I designed the database for a dummy data generator. Since I want to generate multilangual data I thought of these tables. (But its a simplification of the tables).
(first and last)names: id, name
streets: id, name
languages: id, name
Each names.name and streets.name originates from a language, sometimes a name can have multiple origins (ex: Nick is both a Dutch as an English name).
Each language has multiple names and streets.
These two rules result in a Many-to-Many relationship. At the moment I've got only two tables, but I know I will get between 10 and 20 of these kind of tables.
The regular way one would do this is just make 10 to 20 Many-to-Many relationship tables.
Another idea I came up with was just one Many-to-Many table with a third column which specifies which table the id relates to.
At the moment I've got the design on my other PC so I will update it with my ideas visualized after dinner (2 hours or so).
Which idea is better and why?
To make the project idea a bit clearer:
It is always a hassle to create good and enough realistic looking working data for projects. This application will generate this data for you and return the needed SQL so you only have to run the queries.
The user comes to the site to get the data. He states his tablename, his columnnames and then he can link the columnnames to types of data, think of:
* Firstname
* Lastname
* Email adress (which will be randomly generated from the name of the person)
* Adress details (street, housenumber, zipcode, place, country)
* A lot more
Then, after linking columns with the types the user can set the number of rows he wants to make. The application will then choose a country at random and generate realistic looking data according to the country they live in.
That's actually an excellent question. This sort of thing leads to a genuine problem in database design and there is a real tradeoff. I don't know what rdbms you are using but....
Basically you have four choices, all of them with serious downsides:
1. One M-M table with check constraints that only one fkey can be filled in besides language and one column per potential table. Ick....
2. One M-M table per relationship. This makes things quite hard to manage over time especially if you need to change something from an int to a bigint at some point.
3. One M-M table with a polymorphic relationship. You lose a lot of referential integrity checks when you do this and to make it safe, have fun coding (and testing!) triggers.
4. Look carefully at the advanced features in your rdbms for a solution. For example in postgresql this can be solved with table inheritance. The downside is that you lose portability and end up in advanced territory.
Unfortunately there is no single definite answer. You need to consider the tradeoffs carefully and decide what makes sense for your project. If I was just working with one RDBMS, I would do the last one. But if not, I would probably do one table per relationship and focus on tooling to manage the problems that come up. But the former preference is about my level of knowledge and confidence, and the latter is a bit more of a personal opinion.
So I hope this helps you look at the tradeoffs and select what is right for you.

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

Do 1 to 1 relations on db tables smell?

I have a table that has a bunch of fields. The fields can be broken into logical groups - like a job's project manager info. The groupings themselves aren't really entity candidates as they don't and shouldn't have their own PKs.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Is there anything I should watch out for when I do this? Is this just a poor choice?
I can see that maybe my queries will get more complicated with all the extra joins but that can be mitigated with views right? If we're talking about a table with less than 100k records is this going to have a noticeable effect on performance?
Edit: I'll justify the non-entity candidate thoughts a little further. This information is entered by our user base. They don't know/care about each other. So its possible that the same user will submit the same "projectManager name" or whatever which, at this point, wouldn't be violating any constraint. Its for us to determine later on down the pipeline if we wanna correlate entries from separate users. If I were to give these things their own key they would grow at the same rate the main table grows - since they are essentially part of the same entity. At no pt is a user picking from a list of available "project managers".
So, given the above, I don't think they are entities. But maybe not - if you have further thoughts please post.
I don't usually use 1 to 1 relations unless there is a specific performance reason for it. For example storing an infrequently used large text or BLOB type field in a separate table.
I would suspect that there is something else going on here though. In the example you give - PmFirstName - it seems like maybe there should be a single pm_id relating to a "ProjectManagers" or "Employees" table. Are you sure none of those groupings are really entity candidates?
To me, they smell unless for some rows or queries you won't be interested in the extra columns. e.g. if for a large portion of your queries you are not selecting the PmFirstName columns, or if for a large subset of rows those columns are NULL.
I like the smells tag.
I use 1 to 1 relationships for inheritance-like constructs.
For example, all bonds have some basic information like CUSIP, Coupon, DatedDate, and MaturityDate. This all goes in the main table.
Now each type of bond (Treasury, Corporate, Muni, Agency, etc.) also has its own set of columns unique to it.
In the past we would just have one incredibly wide table with all that information. Now we break out the type-specific info into separate tables, which gives us much better performance.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Create a person table, every database needs this. Then in your project table have a column called PMKey which points to the person table.
Why do you feel that the group of fields are not an entity candidates? If they are not then why try to identify them with a prefix?
Either drop the prefixes or extract them into their own table.
It is valuable splitting them up into separate tables if they are separate logical entities that could be used elsewhere.
So a "Project Manager" could be 1:1 with all the projects currently, but it makes sense that later you might want to be able to have a Project Manager have more than one project.
So having the extra table is good.
If you have a PrimaryFirstName,PrimaryLastName,PrimaryPhone, SecondaryFirstName,SecondaryLastName,SEcondaryPhone
You could just have a "Person" table with FirstName, LastName, Phone
Then your original Table only needs "PrimaryId" and "SecondaryId" columns to replace the 6 columns you previously had.
Also, using SQL you can split up filegroups and tables across physical locations.
So you could have a POST table, and a COMMENT Table, that have a 1:1 relationship, but the COMMENT table is located on a different filegroup, and on a different physical drive with more memory.
1:1 does not always smell. Unless it has no purpose.

Separating user table from people table in a relational database

I've done many web apps where the first thing you do is make a user table with usernames, passwords, names, e-mails and all of the other usual flotsam. My current project presents a situation where non-users records need to function similarly to users, but do not need to the ability to be a first order user.
Is it reasonable to create a second table, people_tb, that is the main relational table and data store, and only use the users_tb for authentication? Does separating user_tb from people_tb present any problems? If this is commonly done, what are some strategies and solutions as well as drawbacks?
This is certainly a good idea, as you are normalizing the database. I have done a similar design in an app that I am writing, where I have an employee table and a user table. Users may a from an external company or an employee, so I have separate tables because an employee is always a user, but a user may not be an employee.
The issues that you'll run into is that whenever you use the user table, you'll nearly always want the person table to get the name or other common attributes you would want to show up.
From a coding standpoint, if you're using straight SQL, it will take a little more effort to mentally parse the select statement. It may be a little more complicated if you're using an ORM library. I don't have enough experience with those.
In my application, I'm writing it in Ruby on Rails, so I'm constantly doing things like employee.user.name, where if I kept them together, it would be just employee.name or user.name.
From a performance standpoint, you are hitting two tables instead of one, but given proper indexes, it should be negligible. If you had an index that contained the primary key and the person name, for instance, the database would hit the user table, then the index for the person table (with a nearly direct hit), so the performance would be nearly the same as having one table.
You could also create a view in the database to keep both tables joined together to give you additional performance enhancements. I know in the later versions of Oracle you can even put an index on a view if needed to increase performance.
I routinely do that because for me the concept of "user" (username, password, create date, last login date) is different from "person" (name, address, phone, email). One of the drawbacks that you may find is that your queries will often require more joins to get the info you're looking for. If all you have is a login name, you'll need to join the "people" table to get the first and last name for example. If you base everything around the user id primary key, this is mitigated a bit, but still pops up.
If user_tb has auth info, I would very much keep it separate from people_tb. I would however keep a relationship between the two, and most of users' info would be stored in people_tb except all of the info needed for auth (which i guess will not be used for much else) Its a nice tradeoff between design and efficiency i think.
That is definitely what we do as we have millions of people records and only thousands of users. We also separate address, phones and emails into relational tables as many people have more than one of each of these things. Critial is to not rely on name as the identifier as name is not unique. Make sure the tables are joined through some type of surrogate key (an integer or a GUID is preferable) not name.
I always try to avoid as much data repetition as possible. If not all people need to login, you can have a generic people table with the information that applies to both people and users (eg. firstname, lastname, etc).
Then for people that login, you can have a users table that has a 1~1 relationship with people. This table can store the username and password.
I'd say go for the normalized design (two tables) and only denormalize (go down to one user/person table) if it will really make your life easier down the line. If however practically all people are also users it may be simpler to denormalize up front. Its up to you; I have used the normalized approach without problems.
Very reasonable.
As an example, take a look at the aspnet_* services tables here.
Their built in schema has a aspnet_Users and aspnet_Membership with the later table having more extended information about a given user (hashed passwords, etc) but the aspnet_User.UserID is used in the other portions of the schema for referential integrity etc.
Bottom line, it's very common, and good design, to have attributes in a separate table if they are different entities, as in your case.

Resources