Multiple Tables or multiple fields

Multiple Tables or multiple fields - database

What would be the best way to do this and why?
Here is a quick look at a part of my database design, I'm looking for the best way to organize this data.
"Leads" has many "Students", Leads has many "Contacts"
"Students" belongs to "Leads" and belongs to "People"
id, person_id, lead_id
"Contacts" belongs to "Leads" and belongs to "People"
id, person_id, lead_id
I want to be able to signify which contact is going to be a "payer" and if a contact would be the primary contact or not.
I thought originally I would add two more tables like this:
"PrimaryContacts" belongs to "Contacts"
id, contact_id
"Payer" belongs to "Contacts"
id, contact_id
Then I realized it seems kind of over kill to add two more tables with something I can easily represent in the initial Contacts table like this
"Contacts"
id, person_id, lead_id, type, payer
Then I could have type be 1 or 2, meaning primary or secondary, and then the payer field would be 1 or 2 meaning they either are paying or they aren't.
Is there a benefit of doing it one way or the other or does it matter at all?
Thanks!

I have to admit I'm a little confused by your requirements, but interpreting literally what you say seems to lead to the following database model:
The Contacts.payer flag enables you to have any number of payers, regardless of their primary status.
There really is no need for a separate Payer table in this case.
The Leads.primary_contact_id is a NULL-able FK towards the Contacts, which is what lets you have 0 or 1 primary contact per lead (to avoid the possibility of 0 primary contacts, you'd need a NOT NULL, but this would lead to an insertion cycle, which would have to be resolved through deferred constraints, which are not supported in MySQL).
However, this doesn't guarantee that the primary contact belongs to its own lead (i.e. Contacts.lead_id could be different from Leads.lead_id even when Contacts.contact_id matches Leads.contact_id). Is that a problem? If yes, you'd need a liberal application of identifying relationships and composite PKs, which could be a problem for ORM.
Separate PrimaryContacts table would have a very similar effect to the Leads.primary_contact_id (assuming you got your PK right), and would even have the same problem of allowing 0 primary contacts and lead mismatches. Just having a "backward" FK is simpler and more efficient from the database perspective (though I'm not sure if that's still true from the CakePHP perspective).
Unfortunately, I'm not familiar with CakePHP - hopefully you'll be able to "translate" this model there on your own.

Related

Linking an address table to multiple other tables

I have been asked to add a new address book table to our database (SQL Server 2012).
To simplify the related part of the database, there are three tables each linked to each other in a one to many fashion: Company (has many) Products (has many) Projects and the idea is that one or many addresses will be able to exist at any one of these levels. The thinking is that in the front-end system, a user will be able to view and select specific addresses for the project they specify and more generic addresses relating to its parent product and company.
The issue now if how best to model this in the database.
I have thought of two possible ideas so far so wonder if anyone has had a similar type of relationship to model themselves and how they implemented it?
Idea one:
The new address table will additionally contain three fields: companyID, productID and projectID. These fields will be related to the relevant tables and be nullable to represent company and product level addresses. e.g. companyID 2, productID 1, projectID NULL is a product level address.
My issue with this is that I am storing the relationship information in the table so if a project is ever changed to be related to a different product, the data in this table will be incorrect. I could potentially NULL all but the level I am interested in but this will make getting parent addresses a little harder to get
Idea two:
On the address table have a typeID and a genericID. genericID could contain the IDs from the Company, Product and Project tables with the typeID determining which table it came from. I am a little stuck how to set up the necessary constraints to do this though and wonder if this is going to get tricky to deal with in the future
Many thanks,

I will suggest using Idea one and preventing Idea two.
Second Idea is called Polymorphic Association anti pattern
Objective: Reference Multiple Parents
Resulting side effect: Using dual-purpose foreign key will violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
The simplification of the relationship could be obtained in two ways:
Having multiple null-able forging keys (idea number 1): That will be
simple and applicable if the tables(product, project,...) that using
the relation are limited. (think about when they grow up to more)
Another more generic solution will be using inheritance. Defining a
new entity as the base table for (product, project,...) to satisfy
Addressable. May naming it organization-unit be more rational. Primary key of this organization_unit table will be the primary key of (product, project,...). Other collections like Address, Image, Contract ... tables will have a relation to this base table.

It sounds like you could use Junction tables http://en.wikipedia.org/wiki/Junction_table.
They will give you the flexibility you need to maintain your foreign key restraints, as well as share addresses between levels or entities if that is desired.
One for Company_Address, Product_Address, and Project_Address

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).

Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.

Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

If my entity has a (0-1):1 relation to another entity, how would I model that in the database?

For example, lets say I have an entity called user and an entity called profile_picture. A user may have none or one profile picture.
So I thought, I would just create a table called "user" with this fields:
user: user_id, profile_picture_id
(I left all other attributes like name, email, etc. away, to simplify this)
Ok, so if an user would have no profile_picture, it's id would be NULL in my relational model. Now someone told me that I have to avoid setting anything to NULL, because NULL is "bad".
What do you think about this? Do I have to take off that profile_picture_id from the user table and create a link-table like user__profile_picture with user_id, profile_picture_id?
Which would be considered to be "better practice" in database design?

This is a perfectly reasonable model. True, you can take the approach of creating a join table for a 1:1 relationship (or, somewhat better, you could put user_id in the profile_picture table), but unless you think that very few users will have profile pictures then that's likely a needless complication.
Readability is an important component in relational design. Do you consider the profile picture to be an attribute of the user, or the user to be an attribute of the profile picture? You start from what makes logical sense, then optimize away the intuitive design as you find it necessary through performance testing. Don't prematurely optimize.

NULL isn't "bad". It means "I don't know." It's not wrong for you or your schema to admit it.

"NULL is bad" is a rather poor excuse for a reason to do (or not do) something.
That said, you may want to model this as a dependent table, where the user_id is both the primary key and a foreign key to the existing table.
Something like this:
Users UserPicture Picture
---------------- -------------------- -------------------
| User_Id (PK) |__________| User_Id (PK, FK) |__________| Picture_Id (PK) |
| ... | | Picture_Id (FK) | | ... |
---------------- -------------------- -------------------
Or, if pictures are dependent objects (don't have a meaningful lifetime independent of users) merge the UserPicture and Picture tables, with User_Id as the PK and discard the Picture_Id.
Actually, looking at it again, this really doesn't gain you anything - you have to do a left join vs. having a null column, so the other scenario (put the User_Id in the Picture table) or just leave the Picture_Id right in the Users table both make just as much sense.

Your user table should not have a nullable field called profile_picture_id. It would be better to have a user_id column in the profile_picture table. It should of course be a foreign key to the user table.

Since when is a nullable foreign key relationship "bad?" Honestly introducing another table here seems kind of silly since there's no possibility to have more than one profile picture. Your current schema is more than acceptable. The "null is bad" argument doesn't hold any water in my book.
If you're looking for a slightly better schema, then you could do something like drop the "profile_picture_id" column from the users table, and then make a "user_id" column in the pictures table with a foreign key relationship back to users. Then you could even enforce a UNIQUE constraint on the user_id foreign key column so that you can't have more than one instance of a user_id in that table.
EDIT: It's also worth noting that this alternate schema could be a little bit more future-proof should you decide to allow users to have more than one profile picture in the future. You can simply drop the UNIQUE constraint on the foreign key and you're done.

It is true that having many columns with null values is not recommended. I would suggest you make the picture table a weak entity of user table and have an identifying relationship between the two. Picture table entries would depend on user id.

Make the profile picture a nullable field on the user table and be done with it. Sometimes people normalize just for normalization sake. Null is perfectly fine, and in DB2, NULL is a first class citizen of values with NULL being included in indices.

I agree that NULL is bad. It is not relational-database-style.
Null is avoided by introducing an extra table named UserPictureIds. It would have two columns, UserId and PictureId. If there's none, it simply would not have the respective line, while user is still there in Users table.
Edit due to peer pressure
This answer focuses not on why NULL is bad - but, on how to avoid using NULLs in your database design.
For evaluating (NULL==NULL)==(NULL!=NULL), please refer to comments and google.

Modelling inheritance in a database

For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam

4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).

Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.

I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.

"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.

OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...

The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.

I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx

Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.

Many-to-Many with "Primary"

I'm working on a database that needs to represent computers and their users. Each computer can have multiple users and each user can be associated with multiple computers, so it's a classic many-to-many relationship. However, there also needs to be a concept of a "primary" user. I have to be able to join against the primary user to list all computers with their primary users. I'm not sure what the best way structure this in the database:
1) As I'm currently doing: linking table with a boolean IsPrimary column. Joining requires something like ON (c.computer_id = l.computer_id AND l.is_primary = 1). It works, but it feels wrong because it's not easy to constrain the data to only have one primary user per computer.
2) A field on the computer table that points directly at a user row, all rows in the user table represent non-primary users. This represents the one-primary-per-computer constraint better, but makes getting a list of computer-users harder.
3) A field on the computer table linking to a row in the linking table. Feels strange...
4) Something else?
What is the 'relational' way to describe this relationship?
EDIT:
#Mark Brackett: The third option seems a lot less strange to me now that you've shown how nice it can look. For some reason I didn't even think of using a compound foreign key, so I was thinking I'd have to add an identity column on the linking table to make it work. Looks great, thanks!
#j04t: Cool, I'm glad we agree on #3 now.

Option 3, though it may feel strange, is the closest to what you want to model. You'd do something like:
User {
UserId
PRIMARY KEY (UserId)
}
Computer {
ComputerId, PrimaryUserId
PRIMARY KEY (UserId)
FOREIGN KEY (ComputerId, PrimaryUserId)
REFERENCES Computer_User (ComputerId, UserId)
}
Computer_User {
ComputerId, UserId
PRIMARY KEY (ComputerId, UserId)
FOREIGN KEY (ComputerId)
REFERENCES Computer (ComputerId)
FOREIGN KEY (UserId)
REFERENCES User (UserId)
}
Which gives you 0 or 1 primary user (the PrimaryUserId can be nullable if you want), that must be in Computer_User. Edit: If a user can only be primary for 1 computer, then a UNIQUE CONSTRAINT on Computer.PrimaryUserId will enforce that. Note that there is no requirement that all users be a primary on some computer (that would be a 1:1 relationship, and would call for them to be in the same table).
Edit: Some queries to show you the simplicity of this design
--All users of a computer
SELECT User.*
FROM User
JOIN Computer_User ON
User.UserId = Computer_User.UserId
WHERE
Computer_User.ComputerId = #computerId
--Primary user of a computer
SELECT User.*
FROM User
JOIN Computer ON
User.UserId = Computer.PrimaryUserId
WHERE
Computer.ComputerId = #computerId
--All computers a user has access to
SELECT Computer.*
FROM Computer
JOIN Computer_User ON
Computer.ComputerId = Computer_User.ComputerId
WHERE
Computer_User.UserId = #userId
--Primary computer for a user
SELECT Computer.*
FROM Computer
WHERE
PrimaryUserId = #userId

Edit --
I didn't think properly about it the first 3 times through...
I vote for --
(Number 3 solution)
Users
user id (pk)
Computers
computer id (pk)
primary user id (fk -> computer users id)
Computer Users
user id (pk) (fk -> user id)
computer id (pk) (fk -> user id)
This is the best solution I can think of.
Why I like this design.
1) Since this is a relationship involving computers and users I like the idea of being able to associate a user to multiple computers as the primary user. This may not ever occur where this database is being used though.
2) The reason I don't like having the primary_user on the link table
(computer_users.primary_user_id fk-> users.user_id)
is to prevent a computer from ever having multiple primary users.
Given those reasons Number 3 solution looks better since you will never run into some possible problems I see with the other approaches.
Solution 1 problem - Possible to have multiple primary users per computer.
Solution 2 problem - Computer links to a primary user when the computer and user aren't link to each other.
computer.primaryUser = user.user_id
computer_users.user_id != user.user_id
Solution 3 problem - It does seem kind of odd doesn't it? Other than that I can't think of anything.
Solution 4 problem - I can't think of any other way of doing it.
This is the 4th edit so I hope it makes sense still.

Since the primary user is a function of the computer and the user I would tend to go with your approach of having the primaryUser being a column on the linking table.
The other alternative that I can think of is to have a primaryUser column directly on the computer table itself.

I would have made another table PRIMARY_USERS with unique on computer_id and making both computer_id and user_id foreign keys of USERS.

Either solution 1 or 2 will work. At this point I would ask myself which one will be easier to work with. I've used both methods in different situations though I would generally go with a flag on the linking table and then force a unique constraint on computer_id and isPrimaryUser, that way you ensure that each computer will only have one primary user.

2 feels right to me, but I would test out 1, 2 and 3 for performance on the sorts of queries you normally perform and the sorts of data volumes you have.
As a general rule of thumb I tend to believe that where there is a choice of implementations you should look to your query requirements and design your schema so you get the best performance and resource utilisation in the most common case.
In the rare situation where you have equally common cases which suggest opposite implementations, then use Occam's razor.

We have a similar situation in the application I work on where we have Accounts that can have many Customers attached but only one should be the Primary customer.
We use a link table (as you have) but have a Sequence value on the link table. The Primary user is the one with Sequence = 1. Then, we have an Index on that Link table for AccountID and Sequence to ensure that the combination of AccountID and Sequence is unique (thereby ensuring that no two Customers can be the Primary one on an Account). So you would have:
LEFT JOIN c.computer_id = l.computer_id AND l.sequence = 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight