I'm working on a database that needs to represent computers and their users. Each computer can have multiple users and each user can be associated with multiple computers, so it's a classic many-to-many relationship. However, there also needs to be a concept of a "primary" user. I have to be able to join against the primary user to list all computers with their primary users. I'm not sure what the best way structure this in the database:
1) As I'm currently doing: linking table with a boolean IsPrimary column. Joining requires something like ON (c.computer_id = l.computer_id AND l.is_primary = 1). It works, but it feels wrong because it's not easy to constrain the data to only have one primary user per computer.
2) A field on the computer table that points directly at a user row, all rows in the user table represent non-primary users. This represents the one-primary-per-computer constraint better, but makes getting a list of computer-users harder.
3) A field on the computer table linking to a row in the linking table. Feels strange...
4) Something else?
What is the 'relational' way to describe this relationship?
EDIT:
#Mark Brackett: The third option seems a lot less strange to me now that you've shown how nice it can look. For some reason I didn't even think of using a compound foreign key, so I was thinking I'd have to add an identity column on the linking table to make it work. Looks great, thanks!
#j04t: Cool, I'm glad we agree on #3 now.
Option 3, though it may feel strange, is the closest to what you want to model. You'd do something like:
User {
UserId
PRIMARY KEY (UserId)
}
Computer {
ComputerId, PrimaryUserId
PRIMARY KEY (UserId)
FOREIGN KEY (ComputerId, PrimaryUserId)
REFERENCES Computer_User (ComputerId, UserId)
}
Computer_User {
ComputerId, UserId
PRIMARY KEY (ComputerId, UserId)
FOREIGN KEY (ComputerId)
REFERENCES Computer (ComputerId)
FOREIGN KEY (UserId)
REFERENCES User (UserId)
}
Which gives you 0 or 1 primary user (the PrimaryUserId can be nullable if you want), that must be in Computer_User. Edit: If a user can only be primary for 1 computer, then a UNIQUE CONSTRAINT on Computer.PrimaryUserId will enforce that. Note that there is no requirement that all users be a primary on some computer (that would be a 1:1 relationship, and would call for them to be in the same table).
Edit: Some queries to show you the simplicity of this design
--All users of a computer
SELECT User.*
FROM User
JOIN Computer_User ON
User.UserId = Computer_User.UserId
WHERE
Computer_User.ComputerId = #computerId
--Primary user of a computer
SELECT User.*
FROM User
JOIN Computer ON
User.UserId = Computer.PrimaryUserId
WHERE
Computer.ComputerId = #computerId
--All computers a user has access to
SELECT Computer.*
FROM Computer
JOIN Computer_User ON
Computer.ComputerId = Computer_User.ComputerId
WHERE
Computer_User.UserId = #userId
--Primary computer for a user
SELECT Computer.*
FROM Computer
WHERE
PrimaryUserId = #userId
Edit --
I didn't think properly about it the first 3 times through...
I vote for --
(Number 3 solution)
Users
user id (pk)
Computers
computer id (pk)
primary user id (fk -> computer users id)
Computer Users
user id (pk) (fk -> user id)
computer id (pk) (fk -> user id)
This is the best solution I can think of.
Why I like this design.
1) Since this is a relationship involving computers and users I like the idea of being able to associate a user to multiple computers as the primary user. This may not ever occur where this database is being used though.
2) The reason I don't like having the primary_user on the link table
(computer_users.primary_user_id fk-> users.user_id)
is to prevent a computer from ever having multiple primary users.
Given those reasons Number 3 solution looks better since you will never run into some possible problems I see with the other approaches.
Solution 1 problem - Possible to have multiple primary users per computer.
Solution 2 problem - Computer links to a primary user when the computer and user aren't link to each other.
computer.primaryUser = user.user_id
computer_users.user_id != user.user_id
Solution 3 problem - It does seem kind of odd doesn't it? Other than that I can't think of anything.
Solution 4 problem - I can't think of any other way of doing it.
This is the 4th edit so I hope it makes sense still.
Since the primary user is a function of the computer and the user I would tend to go with your approach of having the primaryUser being a column on the linking table.
The other alternative that I can think of is to have a primaryUser column directly on the computer table itself.
I would have made another table PRIMARY_USERS with unique on computer_id and making both computer_id and user_id foreign keys of USERS.
Either solution 1 or 2 will work. At this point I would ask myself which one will be easier to work with. I've used both methods in different situations though I would generally go with a flag on the linking table and then force a unique constraint on computer_id and isPrimaryUser, that way you ensure that each computer will only have one primary user.
2 feels right to me, but I would test out 1, 2 and 3 for performance on the sorts of queries you normally perform and the sorts of data volumes you have.
As a general rule of thumb I tend to believe that where there is a choice of implementations you should look to your query requirements and design your schema so you get the best performance and resource utilisation in the most common case.
In the rare situation where you have equally common cases which suggest opposite implementations, then use Occam's razor.
We have a similar situation in the application I work on where we have Accounts that can have many Customers attached but only one should be the Primary customer.
We use a link table (as you have) but have a Sequence value on the link table. The Primary user is the one with Sequence = 1. Then, we have an Index on that Link table for AccountID and Sequence to ensure that the combination of AccountID and Sequence is unique (thereby ensuring that no two Customers can be the Primary one on an Account). So you would have:
LEFT JOIN c.computer_id = l.computer_id AND l.sequence = 1
Related
I am trying to figure out the best way to set up my Entity Diagram. I will explain based on the image below.
tblParentCustomer: This table stores information for our Primary Customers, which can either be a Business or Consumer.(They are identified using a lookup table tblCustomerType.)
tblChildCustomer: This table stores customers that are under the Primary Customer. The Primary Business customers can have Authorized Employees and Authorized Reps. The Primary Consumer customers can have Authorized Users. (They are identified using a lookup table tblCustomerType.)
tblChildAccountNumber: This table stores AccountNumbers for tblChildCustomer. These account numbers are mainly for the Child Business Customers. I may be adding Account Numbers for the Child Consumer customers, I am not sure yet, but I believe this design will allow for that if/when necessary.
Going back to tblParentCustomer : If this customer is a Consumer, I will need to add account numbers for them. My question is, do I create a 1 - Many relationship between tblParentCustomer and tblParentAccountNumber? This option would give me 2 different Account Number Tables.
Or would it make sense to create a Junction Account Table that intersects tblParentCustomer and tblChildCustomer?
The first option doesn't really make sense to me because what if there is only 1 Account number for a customer but multiple childCustomers?
Does it make sense to have 2 similar Account Tables that serve a different purpose?
Creating a many-to-many the way you want it to be, you need a link table that will make the whole thing go from 1-* and then *-1
That link table will have two FK, one linking to the parentTable and one linking to the childTable. Combination of those two FK will give you a composite PK (this is important to avoid duplicates). It will allow for any customer to be part of as many accounts as possible (duh.. it'll make the parent/child table a many-to-many relationship).
This approach is extremely common with regards to CRM or any Accounts containing people. Bring it one step further and in that table, you might want to add a "is primary contact" in the AccountMembers table. Drop the childAccountNumber table; you don't need it.
I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).
Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.
Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -
What would be the best way to do this and why?
Here is a quick look at a part of my database design, I'm looking for the best way to organize this data.
"Leads" has many "Students", Leads has many "Contacts"
"Students" belongs to "Leads" and belongs to "People"
id, person_id, lead_id
"Contacts" belongs to "Leads" and belongs to "People"
id, person_id, lead_id
I want to be able to signify which contact is going to be a "payer" and if a contact would be the primary contact or not.
I thought originally I would add two more tables like this:
"PrimaryContacts" belongs to "Contacts"
id, contact_id
"Payer" belongs to "Contacts"
id, contact_id
Then I realized it seems kind of over kill to add two more tables with something I can easily represent in the initial Contacts table like this
"Contacts"
id, person_id, lead_id, type, payer
Then I could have type be 1 or 2, meaning primary or secondary, and then the payer field would be 1 or 2 meaning they either are paying or they aren't.
Is there a benefit of doing it one way or the other or does it matter at all?
Thanks!
I have to admit I'm a little confused by your requirements, but interpreting literally what you say seems to lead to the following database model:
The Contacts.payer flag enables you to have any number of payers, regardless of their primary status.
There really is no need for a separate Payer table in this case.
The Leads.primary_contact_id is a NULL-able FK towards the Contacts, which is what lets you have 0 or 1 primary contact per lead (to avoid the possibility of 0 primary contacts, you'd need a NOT NULL, but this would lead to an insertion cycle, which would have to be resolved through deferred constraints, which are not supported in MySQL).
However, this doesn't guarantee that the primary contact belongs to its own lead (i.e. Contacts.lead_id could be different from Leads.lead_id even when Contacts.contact_id matches Leads.contact_id). Is that a problem? If yes, you'd need a liberal application of identifying relationships and composite PKs, which could be a problem for ORM.
Separate PrimaryContacts table would have a very similar effect to the Leads.primary_contact_id (assuming you got your PK right), and would even have the same problem of allowing 0 primary contacts and lead mismatches. Just having a "backward" FK is simpler and more efficient from the database perspective (though I'm not sure if that's still true from the CakePHP perspective).
Unfortunately, I'm not familiar with CakePHP - hopefully you'll be able to "translate" this model there on your own.
I am building a database as a simple exercise, it could be hosted on any database server, so I am trying to keep things as much standard as possible. Basically what I would like to do is a 'code' table that get referenced by other entities. I explain:
xcode
id code
r role
p property
code
r admin
r staff
p title
....
then I would like to have some view like:
role (select * from code where xcode='r')
r admin
r staff
property (select * from code where xcode='p')
p title
then, suppose we have an entity
myentity
id - 1
role - admin (foreign key to role)
title - title (foreign key to property)
Obviously I cannot create foreign key to a view, but this is to tell the idea I have in mind. How can I reflect such behaviour using whenever possible, standard sql syntax, then as a second option, database additional features like trigger ecc... ?
Because if I tell that role and title in myentity are foreign key to 'code', instead of the views, nothing would stop me to insert a role in title field.
I have worked on systems with a single table for all codes and others with one table per code. I definitely prefer the latter approach.
The advantages of a table per code are:
Foreign keys. As you have already spotted it is not possible to enforce compliance to permitted values through foreign keys with a single table. Using check constraints is an alternative approach but it has a higher maintenance cost.
Performance. Code lookups are not normally a performance bottle neck, but it undoubtedly helps the optimizer to make sensible decisions about execution paths if it knows it is retrieving records from a table with four rows rather than four hundred.
Code groups. Sometimes we want to organise a code into sub-divisions, usually to make it easier to render complex lists of values. If we have a table per code we have more flexibility when it comes to structure.
In addition I notice that you want to be able to deploy "on any database server". In that case avoid triggers. Triggers are usually bad news in most scenarios, but they have product-specific syntax.
What you are trying to do is in most cases an anti pattern and design mistake. Just create the different tables instead of views.
There are some rare cases where this kind of design makes sense. In this kind include the xcode field in the primary key/ foreign key. So your entity will look like this:
myentity
id - 1
role_xcode
role - admin (foreign key to role)
title_xcode
title - title (foreign key to property)
You then can create check constraints to enforce role_xcode='r' and title_xcode='p'
(sorry I don't know if they are standard, they do exist in oracle and are so simple that I'd expect them on other rdbms's as well)
For example, lets say I have an entity called user and an entity called profile_picture. A user may have none or one profile picture.
So I thought, I would just create a table called "user" with this fields:
user: user_id, profile_picture_id
(I left all other attributes like name, email, etc. away, to simplify this)
Ok, so if an user would have no profile_picture, it's id would be NULL in my relational model. Now someone told me that I have to avoid setting anything to NULL, because NULL is "bad".
What do you think about this? Do I have to take off that profile_picture_id from the user table and create a link-table like user__profile_picture with user_id, profile_picture_id?
Which would be considered to be "better practice" in database design?
This is a perfectly reasonable model. True, you can take the approach of creating a join table for a 1:1 relationship (or, somewhat better, you could put user_id in the profile_picture table), but unless you think that very few users will have profile pictures then that's likely a needless complication.
Readability is an important component in relational design. Do you consider the profile picture to be an attribute of the user, or the user to be an attribute of the profile picture? You start from what makes logical sense, then optimize away the intuitive design as you find it necessary through performance testing. Don't prematurely optimize.
NULL isn't "bad". It means "I don't know." It's not wrong for you or your schema to admit it.
"NULL is bad" is a rather poor excuse for a reason to do (or not do) something.
That said, you may want to model this as a dependent table, where the user_id is both the primary key and a foreign key to the existing table.
Something like this:
Users UserPicture Picture
---------------- -------------------- -------------------
| User_Id (PK) |__________| User_Id (PK, FK) |__________| Picture_Id (PK) |
| ... | | Picture_Id (FK) | | ... |
---------------- -------------------- -------------------
Or, if pictures are dependent objects (don't have a meaningful lifetime independent of users) merge the UserPicture and Picture tables, with User_Id as the PK and discard the Picture_Id.
Actually, looking at it again, this really doesn't gain you anything - you have to do a left join vs. having a null column, so the other scenario (put the User_Id in the Picture table) or just leave the Picture_Id right in the Users table both make just as much sense.
Your user table should not have a nullable field called profile_picture_id. It would be better to have a user_id column in the profile_picture table. It should of course be a foreign key to the user table.
Since when is a nullable foreign key relationship "bad?" Honestly introducing another table here seems kind of silly since there's no possibility to have more than one profile picture. Your current schema is more than acceptable. The "null is bad" argument doesn't hold any water in my book.
If you're looking for a slightly better schema, then you could do something like drop the "profile_picture_id" column from the users table, and then make a "user_id" column in the pictures table with a foreign key relationship back to users. Then you could even enforce a UNIQUE constraint on the user_id foreign key column so that you can't have more than one instance of a user_id in that table.
EDIT: It's also worth noting that this alternate schema could be a little bit more future-proof should you decide to allow users to have more than one profile picture in the future. You can simply drop the UNIQUE constraint on the foreign key and you're done.
It is true that having many columns with null values is not recommended. I would suggest you make the picture table a weak entity of user table and have an identifying relationship between the two. Picture table entries would depend on user id.
Make the profile picture a nullable field on the user table and be done with it. Sometimes people normalize just for normalization sake. Null is perfectly fine, and in DB2, NULL is a first class citizen of values with NULL being included in indices.
I agree that NULL is bad. It is not relational-database-style.
Null is avoided by introducing an extra table named UserPictureIds. It would have two columns, UserId and PictureId. If there's none, it simply would not have the respective line, while user is still there in Users table.
Edit due to peer pressure
This answer focuses not on why NULL is bad - but, on how to avoid using NULLs in your database design.
For evaluating (NULL==NULL)==(NULL!=NULL), please refer to comments and google.