Foreign Key Referencing Multiple Tables - sql-server

I have a column with a uniqueidentifier that can potentially reference one of four different tables. I have seen this done in two ways, but both seem like bad practice.
First, I've seen a single ObjectID column without explicitly declaring it as a foreign key to a specific table. Then you can just shove any uniqueidentifier you want in it. This means you could potentially insert IDs from tables that are not part of the 4 tables I wanted.
Second, because the data can come from four different tables, I've also seen people make 4 different foreign keys. And in doing so, the system relies on ONE AND ONLY ONE column having a non-NULL value.
What's a better approach to doing this? For example, records in my table could potentially reference Hospitals(ID), Clinics(ID), Schools(ID), or Universities(ID)... but ONLY those tables.
Thanks!

You might want to consider a Type/SubType data model. This is very much like class/subclasses in object oriented programming, but much more awkward to implement, and no RDBMS (that I am aware of) natively supports them. The general idea is:
You define a Type (Building), create a table for it, give it a primary key
You define two or more sub-types (here, Hospital, Clinic, School, University), create tables for each of them, make primary keys… but the primary keys are also foreign keys that reference the Building table
Your table with one “ObjectType” column can now be built with a foreign key onto the Building table. You’d have to join a few tables to determine what kind of building it is, but you’d have to do that anyway. That, or store redundant data.
You have noticed the problem with this model, right? What’s to keep a Building from having entries in in two or more of the subtype tables? Glad you asked:
Add a column, perhaps “BuildingType”, to Building, say char(1) with allowed values of {H, C, S, U} indicating (duh) type of building.
Build a unique constraint on BuildingID + BuildingType
Have the BulidingType column in the subtables. Put a check constraint on it so that it can only ever be set to the value (H for the Hospitals table, etc.) In theory, this could be a computed column; in practice, this won't work because of the following step:
Build the foreign key to relate the tables using both columns
Voila: Given a BUILDING row set with type H, an entry in the SCHOOL table (with type S) cannot be set to reference that Building
You will recall that I did say it was hard to implement.
In fact, the big question is: Is this worth doing? If it makes sense to implement the four (or more, as time passes) building types as type/subtype (further normalization advantages: one place for address and other attributes common to every building, with building-specific attributes stored in the subtables), it may well be worth the extra effort to build and maintain. If not, then you’re back to square one: a logical model that is hard to implement in the average modern-day RDBMS.

Let's start at the conceptual level. If we think of Hospitals, Clinics, Schools, and Universities as classes of subject matter entities, is there a superclass that generalizes all of them? There probably is. I'm not going to try to tell you what it is, because I don't understand your subject matter as well as you do. But I'm going to proceed as if we can call all of them "Institutions", and treat each of the four as subclasses of Institutions.
As other responders have noted, class/subclass extension and inheritance are not built into most relational database systems. But there is plenty of assistance, if you know the right buzzwords. What follows is intended to teach you the buzzwords, in database lingo. Here is a summary of the buzzwords coming: "ER Generalization", "ER Specialization", "Single Table Inheritance", "Class Table Inheritance", "Shared Primary Key".
Staying at the conceptual level, ER modeling is a good way of understanding the data at a conceptual level. In ER modeling, there is a concept, "ER Generalization", and a counterpart concept "ER Specialization" that parallel the thought process I just presented above as "superclass/subclass". ER Specialization tells you how to diagram subclasses, but it doesn't tell you how to implement them.
Next we move down from the conceptual level to the logical level. We express the data in terms of relations or, if you will, SQL tables. There are a couple of techniques for implementing subclasses. One is called "Single Table Inheritance". The other is called "Class Table Inheritance". In connection with Class table inheritance, there is another technique that goes by the name "Shared primary Key".
Going forward in your case with class table inheritance, we first design a table called "Institutions", with an Id field, a name field, and all of the fields that pertain to institutions, no matter which of the four kinds they are. Things like mailing address fields, for instance. Again, you understand your data better than I do, and you can find fields that are in all four of your existing tables. We populate the id field in the usual way.
Next we design four tables called "Hospitals", "Clinics", "Schools", and "Universities". These will contain an id field, plus all of the data fields that pertain only to that kind of institution. For instance, a hospital might have a "bed capacity". Again, you understand your data better than I do, and you can figure these out from the fields in your existing tables that didn't make it into the Institutions table.
This is where "shared primary key" comes in. When a new entry is made into "Institutions", we have to make a new parallel entry into one of four specialized subclass tables. But we don't use some sort of autonumber feature to populate the id field. Instead, we put a copy of the id field from the "Institutions" table into the id field of the subclass table.
This is a little work, but the benefits are well worth the effort. Shared primary key enforces the one-to-one nature of the relationship between subclass entries and superclass entries. It makes joining superclass data and subclass data simple, easy, and fast. It eliminates the need for a special field to tell you which subclass a given institution belongs in.
And, in your case, it provides a handy answer to your original question. The foreign key you were originally asking about is now always a foreign key to the Institutions table. And, because of the magic of shared-primary-key, the foreign key also references the entry in the appropriate subclass table, with no extra work.
You can create four views that combine institution data with each of the four subclass tables, for convenience.
Look up "ER Specialization", "Class Table Inheritance", "Shared Primary Key", and maybe "Single Table Inheritance" on the web, and here in SO. There are tags for most of these concepts or techniques here in SO.

You could put a trigger on the table and enforce the referential integrity there. I don't think there's a really good out-of-the-box feature to implement this requirement.

Related

The foreign key usage/extra table

I recently started looking into Database design. I have worked with Oracle, but would now like to create logical or conceptual design relationships first, before I implement them into the database. Learning the basics you could say.
I would like to create a database for cars. I have some tables, but am having trouble with the relationships, and when to use a foreign key/extra table.
I have created a car table, and added attributes. Now it is very clear to me to use a manufacturer foreign key in the car table referencing the manufacturer table.
But for example I would like to show what type(SUV, sedan, etc.) the car is. Furthermore I would like to show What class(normal, Upperclass, etc.) the car is. Since I will only differentiate between a maximum of 5 car types, do I still need to add a foreign key? Same goes for the Class Situation as well.
I have heard to always use a foreign key, because it safeguards the integrity of the database, but at University my teacher always told us to use the Minimum amount of tables as possible, therefore putting me in an awkward spot.
What should I do?
I would greatly appreciate clarification in this matter.
Even in toy systems, you should use foreign keys and extra tables; learning this habit will serve you well when the database gets larger, or when you start designing real databases.
I imagine that the remark about "minimum amount of tables" (should be "minimum number") is to prevent you creating tables containing two values (e.g. "yes" and "no", "present" and "not present"), again a good habit to learn. It could also refer to queries, where you do not want to join unnecessary tables; this is liable to lead to the query retrieving duplicate rows.
Car type and car class are examples of what is called an enumerated domain. A domain is the set of values that an attribute may take on. In Oracle terms it's the set of values that may be placed in a given column. Oracle, unlike some other SQL dialects, does not have a CREATE DOMAIN statement. You may be able to use CREATE TYPE instead but I can't tell you how.
The simplest solution for you may be to just create a lookup table, as you suggest. Don't worry too much about added complexity here.
It is easier to manage 100 tables than 100,000 lines of code. That's an old slogan from the dba world, but it has some truth to it.

Separate tables for 1-1 relationship

I'm creating an Access database to hold student internship information. The issue I'm having is I have three tables that have a one and only one relationship with the internship table (Assignment, Supervisor Evaluation, and Student Evaluation).
Since Access doesn't allow a table to have more than one auto generated number, I can't let the internship table create the ID number for each of the three tables. So, I'm not sure how to make it so when we enter data into these tables forms, I can assign it specifically to an internship. Any advice?
1-1 relationships always smell like they should be merged into one table. This is particularly so if they are actually 1-1 and shouldn't be 1-0,1. In the latter case, if the dependent information can be missing and will be missing in a majority of cases, it might be helpful to separate it away into a table of its own. But even this can be expressed by giving null values to certain attributes.
Now if, for some reason, you insist on those 4 tables, there are two ways to go for the primary keys. One is, for the dependent tables, not to declare the primary key as auto-generated, but just as a number, and to assign to it the autogenerated value of the Internship record. Another is to auto-generate a primary key for each of the dependent tables, and have a foreign key in the Intership table for each of them. As I consider the entire construct of those dependent tables as unnecessarily complicated, I can't give a recommendation on which of these ways to prefer.
There is another concern I have about your data model. Your tables have those attributes like answer1, answer2, ... Now if you have a small fixed amount of those attributes, this might be okay. But could you have a larger set of fixed questions, maybe for each type of internship, that might vary dynamically and can't just be expressed by a fixed column structure? In that case you would need something like
Question(id, text)
Internship(id, ...)
Answer(id, internship_id, question_id, student_answer, supervisor_evaluation)
So your cardinalities would be
Internship 1-----0,n Answer 0,n------1 Question
Same for the other details of the internship.

Name for DB Entity Relationship Table and is it a good idea?

I'm not a DB design expert and have what I suspect is a newbie question. If it's better answered in another forum (or from a simple reference), please let me know.
Given a Table "Recordings" and a table "Artists". Both tables have primary keys suitably defined. There is a relationship that we want to express between these tables. Namely, An artist could have many recordings, or no recordings. A recording can only have 1 or 0 artists. (We could have some obscure recording with no known artist).
I thought the solution to this problem was to have a foreign key pointing to artist in the Recording Table. This field could be null (the recording has no artist). Additionally, we should define cascading deletes, such that if an artist is deleted, all recordings that have a foreign referring to that artist, now have a foreign key of null. [I really do want to leave the actual recording when you delete the artist. Our real tables are not "artists" and "recordings" and a recording can exist without an artist].
However, this is not how my colleagues have set things ups. There is no foreign key column in 'Recordings', but rather an extra table 'RecordingArtist_Mapping' with two columns,
RecordingKey ArtistKey
If an Artist (or Recording) is removed, the corresponding entry in this mapping table is removed. I'm not saying this is wrong, just different to what I expected. I have certainly seen a table like this when one has a many-many relationship, but not the relationship I described above.
So, my questions are:
Have you heard of this way of describing the relationship?
Is there a name for this type of table?
Is this a good way to model the relationship or would be be better off with the foreign key idea I explained? What are the pros/cons of each?
My colleagues pointed out that with the foreign key idea, you could have a lot of nulls in the Recordings Table, and that this violates (perhaps just in spirit?) one of the Five Normal Forms in Relational Database Theory. I'm way out of my league on this one :) Does it violate one of these forms? Which one? How? (bonus points for simple reference to "Five Normal Forms" :) ).
Thank you for your help and guidance.
Dave
On the face of it, this it simply an intersection table that allows a many-to-many relationship between two other tables.
When you find that you need one of these it is generally a good idea to consider "what does this table mean", and "have I included all the relevant attributes".
In this case the table tells you that the artist contributed to the recording in some way, and you might then consider "what was the nature of the contribution".
Possibly that they played a particular instrument, or instruments. Possibly they were a conductor.
You might then consider whether people other than artists made a contribution to the recording -- sound engineer? So that leads you to consider whether "artist" is a good table at all, because you might instead want a table that represents people in general, and then you can relate any of them to a recording. Maybe you even want to record the contribution of a non-person -- the London Symphony Orchestra, for example.
You can even have entities that contribute in multiple ways -- guitarist, vocalist, and producer? You might also consider whether there ought to be a ranking of the contributions so that they are listed in the correct order (which may be a contractual issue).
This is exactly the way that contributions to written works are generally modelled -- here is a list of the contributor codes used in the ONIX metadata schema for books, as an illustrative industry example: https://www.medra.org/stdoc/onix-codelist-17.htm
Your solution with a foreign key in Recording is absolutely correct from the Normalization Theory point of view, it does not violate any significant normal form (the most important one are Third Normal Form, and Boyce-Codd Normal Form, and neither of them is violated).
Moreover, a part being conceptually simpler and safe, from a practical point of view it is more efficient, since it in general reduces the number of joins that must be done. In may opinion, the pros are greater than the cons.
Yes, that's a viable setup, this is called vertical partitioning.
Basically, you move your artist field from recording to another table with the primary key referencing that on recording.
The benefit is you don't necessarily have to retrieve artists with doing lookups on recordings, the drawback is that if you still have to, if would be somewhat slower, because of an extra join.
Have you heard of this way of describing the relationship?
Yes, it's a many to many relationship. A recording can have more than one artist. An artist can have more than one recording.
Is there a name for this type of table?
I call them junction tables.
Is this a good way to model the relationship or would be be better off with the foreign key idea I explained? What are the pros/cons of each?
A junction table is required in a many to many relationship. When you have a one to many relationship, you would use a foreign key in the many table.
As far as 4th level and 5th level database normalization, this A Simple Guide to Five Normal Forms in Relational Database Theory article from 1982 explains the different levels.
Under fourth normal form, a record type should not contain two or more independent multi-valued facts about an entity.
Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy.
I remember the first 3 levels of normalization with this sentence.
I solemnly swear that the columns rely on the key, the whole key, and nothing but the key, so help me Codd.

How do I properly design a database? Foreign Keys vs Secondary Keys?

here are some generic tables, I am trying to fully understand how to properly setup databases tables. Are these setup correctly? I want to be able to lookup a user's Items and Item Details as fast as possible. FYI for this example ItemDetailsX do not share the same data fields.
I am a little bit stuck on Foreign Keys and Secondary keys. When do you use a Secondary Key vs a Foreign Key?
tbl_Users 1:* tbl_Item //relationship
tbl_Item 1:1 tbl_ItemDetail1 & tbl_ItemDetail2 // relationship
tbl_Item 1:N tbl_ItemDetail3 //releationship
tbl_Users
-UserID - PK
tbl_Item
-ItemID - PK
-UserID - FK
tbl_ItemDetail1
-ItemDetail1ID - PK //Do I even need this if I have ItemID? Its a 1:1 relationship with
-ItemID - FK
-Count
-Duration
-Frequency
tbl_ItemDetail2
-ItemDetail2ID - PK //Do I even need this if I have ItemID? Its a 1:1 relationship with
-ItemID - FK
-OnOff
-Temperature
-Voltage
tbl_ItemDetail3
-ItemDetail3ID - PK //Has a 1:N relationship
-ItemID - FK
-Contrived Value1
-Contrived Valu2
EDIT:
Thanks for the replies, I have updated my original post to properly reflect my database.
In the database that I am creating, the Item has ~9 item details. Each item details is 5-15 columns of data.
Having 1 table with like 100 columns does not make sense...?
Databases enforce 3 kinds of declarative integrity:
Integrity of domain - field's type and CHECK constraint.
Integrity of key - PRIMARY KEY or UNIQUE constraint.
Referential integrity - FOREIGN KEY.
A key uniquely identifies rows in the table. All keys are logically equivalent, but for practical reasons one of them is chosen as "primary" and the rest are considered "alternate" (there are some complications involving NULLs, but let's not get into that here).
On the other hand, a FOREIGN KEY is as a kind of "pointer" from one table to another, where the DBMS itself guarantees this "pointer" can never "dangle". The foreign key references the (primary or alternate) key in "parent" table, but the "child" endpoint does not need to be a key itself (and usually isn't).
When a row is modified or deleted from the parent table, this change is either cascaded to the child table (ON [UPDATE/DELETE] [CASCADE/SET NULL/SET DEFAULT]) or the whole operation is blocked (ON [UPDATE/DELETE] RESTRICT).
If a child is inserted or modified, it is checked against the parent table to make sure this new value exists there.
The constraints change the meaning of data. Indexes, on the other hand, do not change the meaning of data - they are here purely for performance reasons. Some databases will even allow you to have a key without an underlying index, although this is usually a bad idea performance-wise. An index underneath the primary key is called "primary index" and all other indexes are "secondary".
BTW, there is "secondary index" and there is "alternate key", but there is no such thing as "secondary key".
I'm not quite sure what is your design goal, but I'm guessing something like this would be a decent starting point:
I see no purpose in extracting details to separate tables if they are always in 1:1 relationship with the item.
--- EDIT ---
Some questions you'll need to ask yourself before being able to arrive at optimal database design:
Is there a real 1:1 relationship between item and detail or is it actually 1:0..1 (i.e. some details are optional?).
If 1:1, just using columns is the most natural representation. BTW, a decent DBMS will have no trouble handling 100 columns.
If 1:0..1, you'll have to decide whether to use NULL-able columns, or separate tables. Just keep in mind that most DBMSes are really efficient in storing NULLs (typically just a small bitmap per row), so separating the data to a different table might not get you much, and in fact may substantially worsen the querying performance due increased need for JOINing.
Are all detail kinds predetermined (i.e. can you confidently say you won't need to add any new kinds of details later in the application's lifecycle)?
If yes, just use columns.
If no, adding columns on the large existing database can be expensive - whether it is expensive enough to warrant using separate table is up to you to measure.
You could also consider generalizing all the details as name/value pairs and representing them within a single 1:N table (not shown here). This is very flexible and "evolvable", but has its own set of problems.
How do you intend to query the data? This is a biggie and may influence substantially whether to go with "columns" or "separate table" approach, indexing etc...
BTW, the 1:0..1 with separate tables can be modeled like this...
...and 1:1 can be modeled like this...
...but this introduces circular dependency that must be handled in a special way (usually by deferring one of the FOREIGN KEYs).
1:N details, of course, are another matter and are naturally modeled through separate tables.
Since you say "detail 1" and "detail 2" are 1:(0..)1 and "detail 3" is 1:N, your "updated" data model would probably look something like this:
BTW, the above model uses identifying relationships which result in more "natural" keys. Non-identifying relationships / surrogate keys approach would look like this:
Each approach has its advantages, but this post is becoming a little long already ;) ...
Your question cannot be answered in one simple SO post. There are a lot of things to consider when creating a database. The best thing I ever did to learn about databases and how to create them was to read a book called "Database Design For Mere Mortals" written by Michael Hernandez.
See my post on Programmers to the question How do you approach database design?

Can a database table contains more than one primary key?

Can a database table contains more than one primary key?
Yes, I am talking about RDBMS.
A table can have:
No primary keys;
One primary key consisting of one column; or
One composite primary key consisting of two or more columns.
Other than that you can have any number of unique indexes, which will do basically the same thing.
The primary key of a relational table uniquely identifies each record in the table.
So, in order to keep the uniqueness of each record, you cant have more than one primary key for the table.
It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in combination.
That's why it is called Primary Key because it is, well, PRIMARY
Yes, you can have Composite primary keys, that is, having two fields as a primary key.
"First of all, you have to understand the history of entity-relationship design methodology as well as understand the word "relational" in relational database management systems (RDBMS)."
May I suggest politely that you first get YOURSELF educated on these very same subjects before leading other people into flawed beliefs ? I'll respond to the two worst ones of your stupidities below.
"According to relational methodology principles, each entity should only have one and only one means to identify it."
That is about the biggest crap I have ever heard anybody spawn around about relational data design. The relational model does not constrain any "entity", as you erroneously call it, to have any precise number of keys. Any "entity" can have any number of keys, and EACH key is, by definition of its very property of making the "rows" unique, a valid candidate for any purpose of "identification". Choosing the most useful/appropriate one for use in certain contexts (foreign keys in referencing tables, e.g.), is a design issue, and the relational model does not have anything to say on such things.
"Therefore, "R"DBMS attempts to facilitate the modeling of entity relationships."
Codd's paper "A Relational model of date for large shared data banks", which marks the birth of the relational model, predates the invention of E-R by a number of years. So to say that the Relational model attempts to facilitate the modeling of E-R concepts, is having things COMPLETELY backwards, and nothing but a display of one's own complete and utter ignorance of "the history" that you referred to in your own answer.
The short answer is yes. A primary key is a candidate key and is in principle no different to any other candidate key. It is a widely observed convention that one candidate key per table is designated as the "primary" one - meaning that it is "preferred" or has some special meaning for the database designer or user. This is just convention however. It is only a label of convenience and a reminder about the potential significance of one key. In practice all keys can serve the same purpose and the "primary" one is not special or unique in any fundamental way.
First of all, you have to understand the history of entity-relationship design methodology as well as understand the word "relational" in relational database management systems (RDBMS).
In order to define the bounds of an entity and relationships to be formed, there must be a unique handle or a unique combination of handles to identify each single instance of an entity and then to form relationships between them.
You also need to understand the meaning/root of the word "identify" which is to zero in on the "identity" of each instance of an entity. "identity" being the mathematical term meaning "one" or a singularity.
According to relational methodology principles, each entity should only have one and only one means to identify it. Therefore, "R"DBMS attempts to facilitate the modeling of entity relationships. Note the differences between "Entity/Class" and "Entity/Class instance".
However, RDBMS is used widely and mostly by people not so interested in accurately portraying the E-R design principles. So that frequently, we have more than one possible entity-definition sitting inside a table, which I call entity-aliasing. Opposed to identity-aliasing, where two or more instances of an entity-set hides under the same key, entity-aliasing is like the table
EmpProj([empId], empName, empAddr, projId, projLoc)
actually has two entity-sets aliased under the same table:
Emp([empId], empName, empAddr)
Proj([projId], projLoc, empId)
That is when normalisation comes in - to separate these entities out. Try as we might to do a decent design normalisation, computer scientists may not have as good a perspective on the information as a statistician. The computer scientist (which in this discussion includes everyone with a decent knowledge of ER design) tries his/her best in creating a schema that cleanly defines entities and their relationships.
However, after 18 months analysing voluminous information from the database, the statistician begin to see principal components that emerge whose analyses is terribly crippled due to the misalignment of the principal components with those of boundaries of the computer scientists' perceived entities.
That is where alternate unique keys are good for - to identify instances of entities due to the principal components existing as ghost-entities in the database.
Therefore, the primary key of a table is because that table is perceived to be a perfect entity as an entity should have only one primary key, be it singular or composite.
As far as the statistician is concerned, even though the database allows only one primary key per table, the alternative unique keys is to the statistician the primary keys to those ghost-entities. Which is why sometimes you are frustrated by statisticians who seem to do double work by downloading the data into the local database of their workstation/PC.
In conclusion, the constraint placed by the "R"DBMS manufacturer in allowing only one primary key per table is their pretense in believing that they know how information behave and believing that principal components of the information due to the population do not mutate over time.
If you have more than one unique keys possible in a table it means either one or more of the possibilities
Like myself, you are lazy to
separate them since they seem to
work quite well
For performance' sake, mixing the
entities into the same table makes
the application run incredibly
faster
Like the statistician, you gradually
discover ghost entities in your
information.

Resources