Separate tables for 1-1 relationship - database

I'm creating an Access database to hold student internship information. The issue I'm having is I have three tables that have a one and only one relationship with the internship table (Assignment, Supervisor Evaluation, and Student Evaluation).
Since Access doesn't allow a table to have more than one auto generated number, I can't let the internship table create the ID number for each of the three tables. So, I'm not sure how to make it so when we enter data into these tables forms, I can assign it specifically to an internship. Any advice?

1-1 relationships always smell like they should be merged into one table. This is particularly so if they are actually 1-1 and shouldn't be 1-0,1. In the latter case, if the dependent information can be missing and will be missing in a majority of cases, it might be helpful to separate it away into a table of its own. But even this can be expressed by giving null values to certain attributes.
Now if, for some reason, you insist on those 4 tables, there are two ways to go for the primary keys. One is, for the dependent tables, not to declare the primary key as auto-generated, but just as a number, and to assign to it the autogenerated value of the Internship record. Another is to auto-generate a primary key for each of the dependent tables, and have a foreign key in the Intership table for each of them. As I consider the entire construct of those dependent tables as unnecessarily complicated, I can't give a recommendation on which of these ways to prefer.
There is another concern I have about your data model. Your tables have those attributes like answer1, answer2, ... Now if you have a small fixed amount of those attributes, this might be okay. But could you have a larger set of fixed questions, maybe for each type of internship, that might vary dynamically and can't just be expressed by a fixed column structure? In that case you would need something like
Question(id, text)
Internship(id, ...)
Answer(id, internship_id, question_id, student_answer, supervisor_evaluation)
So your cardinalities would be
Internship 1-----0,n Answer 0,n------1 Question
Same for the other details of the internship.

Related

Where is the limit to what would be considered data duplication in a database?

What is the line that you should draw when normalising data, in terms of data duplication? i.e would you say that 2 employees who share the same birthday or have the same timestamp for a shift is data duplication? and therefore should be placed into another data table?
Birth date has full and non-transitive dependency to a person which means that it should be stored within the same table where you keep your employees and it would comply with third normal form (3NF).
Work shifts are not an attribute of an employee which means that they are a different entity and stay in relation with employee entity.
There is no a particular 'limit' when following the normalisation to data, since the main restriction that is given for every relational database table is to have an unique parimary key. Hence, if all other columns contain the same data, but the primary key is still different, it is a different row of a table.
The actual restrictions can come in two form. One is either the programming or systhematic approach, where the restriction on what kind of data is inputed is given from a program which interacts with the database or already defined script handed down physically for the admin of the database.
Other, more database-orriented approach would be to create primary keys composed of multiple columns. That way a row is unique only if for both columns the data is unique. It should be noted that a primary key is not necessary the same as an unique key, which should be different for every instance.
You have misunderstood what normalization does.
Two attributes having the same value (i.e. two employees having the same birthday) is not redundancy.
Rather having the same attribute in the two tables (i.e. two tables having birthday column, therefore repeating every employee's birthday information) is.
Normalization is a quality decision and denormalization is a performance decision. For my school projects, my teachers recommended me to normalize at least till 3NF. So that may be a good guideline.

Should the contents of a column acting as a primary key be interpretable or purely unique integers

I have the luxury of designing a database from scratch. When designing columns to act as unique keys should I just use unique integers or should I attempt to make the values interpretable. So if I had a lookup table of ward names in a hospital should the id column contain unique codes that in someway relate to the name of the ward or just unique integers?
Resist the temptation to overload the id values with meaning. Use other attributes to store the info you're considering stuffing into the id.
Overloading the id with "meaning" is bad because:
If the data being stuffed into the ID changes, so must your ID. ID's should never change
If the data type of the data changes, you'll have a problem, for example:
If your ID is numeric, and the stuffed info changes from numeric to text, you'll have big problems
If the stuffed data changes from a simple field to a one-to-many child, your model will break
What you believe has "important" meaning now may not be important in the future. Then your "specially encoded" data will become useless and a burden, even a serious restriction
What currently "identifies" a product may change as the business evolves
If have seen this idea attempted many times, never successfully. In every case, the idea was scraped and surrogate IDs were introduced to replace the magic IDs, with all the risk and development cost associated with that task.
In my career, have seen most of the problems listed above actually happen.
You should not be using a lookup table. Make your tables innodb and use referential integrity to join tables together. Your id columns should always be set as primary and should be set to auto increment. Never try to make up your own ids. You should really look at some tutorial on referential integrity and learn how to assoicate tables with other tables.

Should I make a foreign key that can be null or make a new table?

I have a small question concerning with how I should design my database. I have a table dogs for an animal shelter and I have a table owners. In the table dogs all dogs that are and once were in the shelter are being put. Now I want to make a relation between the table dogs and the table owners.
The problem is, in this example not all dogs have an owner, and since an owner can have more than one dog, a possible foreign key should be put in the table dogs (a dog can't have more than one owner, at least not in the administration of the shelter). But if I do that, some dogs (the ones in the shelter) will have null as a foreign key. Reading some other topics taught me that that is allowed. (Or I might have read some wrong topics)
However, another possibility is putting a table in between the two tables - 'dogswithowners' for example - and put the primary key of both tables in there if a dog has an owner.
Now my question is (as you might have guessed) what the best method is of these two and why?
The only solution that is in keeping with the principles of the Relational Model is the extra table.
Moreover, it's hard to imagine how you are going to find any hardware that is so slow that the difference in performance when you start querying, is going to be noticeable. After all, it's not a mission-critical tens-of-thousands-of-transactions-per-second appliation, is it ?
I agree with Philip and Erwin that the soundest and most flexible design is to create a new table.
One further issue with the null-based approach is that different software products disagree over how SQL's nullable foreign keys work. Even many IT professionals don't understand them properly so the general user is even less likely to understand it.
The nullable foreign key is a typical solution.
The most straightforward one is just to have another table of owners and dogs, with foreign keys to the owner and dog tables with the dog column UNIQUE NOT NULL. Then if you only want owners or owned dogs you do not have to involve IS NOT NULL in your queries and the DBMS does not need to access them among all owners and dogs. NULLs can simplify certain situations like this one but they also complicate compared to having a separate table and just joining when you want that data.
However, if it could become possible for a dog to have multiple owners then you might need the extra table anyway as many:many relationship without the UNIQUE NOT NULL column and the column pair owner-dog UNIQUE NOT NULL instead. You can always start with the one UNIQUE NOT NULL and move to the other if things change.
In the olden days of newsgroups, we had this guy called -CELKO- who would pop up and say, "There is a design rule of thumb that says a relational table should model either an entity or a relationship between entities but never both." Not terribly formal but it is a good rule of thumb in my opinion.
Is 'owner' (person) really an attribute of a dog? It seems to me more like you want to model the relationship 'ownership' between a person and a dog.
Another useful rule of thumb is to avoid SQL nulls! Three-valued logic is confusing to most users and programmers, null behavior is inconsistent throughout the SQL Standard and (as sqlvogel points out) SQL DBMS vendors implementation things in different ways. The best way of modelling missing data is by the omission of tuple in a relvar (a.k.a. don't insert anything into your table!). For example, Fido is included in Dog but omitted from DogOwnership then according to the Closed World Assumption Fido sadly has no owner.
All this points to having two tables and no nullable columns.
I wouldn't do any extra table. If for some reason no nulls allowed (it's a good question why) - I would, and I know some solutions do the same, put instead of null some value, that can't be a real key. e.g NOT_SET or so.
hope it helps
A nullable column used for foreign key relationship is perfectly valid and used for scenarios exactly like yours.
Adding another table to connect the owners table with the dogs table will create a many to many relationship, unless a unique constraint is created on one of it's columns (dogs in your case).
Since you describe a one to many relationship, I would go with the first option, meaning having a nullable foreign key, since I find it more readable.

Database Mapping - Multiple Foreign Keys

I want to make sure this is the best way to handle a certain scenario.
Let's say I have three main tables I will keep them generic. They all have primary keys and they all are independent tables referencing nothing.
Table 1
PK
VarChar Data
Table 2
PK
VarChar Data
Table 3
PK
VarChar Data
Here is the scenario, I want a user to be able to comment on specific rows on each of the above tables. But I don't want to create a bunch of comment tables. So as of right now I handled it like so..
There is a comment table that has three foreign key columns each one references the main tables above. There is a constraint that only one of these columns can be valued.
CommentTable
PK
FK to Table1
FK to Table2
FK to Table3
VarChar Comment
FK to Users
My question: is this the best way to handle the situation? Does a generic foreign key exist? Or should I have a separate comments table for each main table.. even though the data structure would be exactly the same? Or would a mapping table for each one be a better solution?
My question: is this the best way to handle the situation?
Multiple FKs with a CHECK that allows only one of them to be non-NULL is a reasonable approach, especially for relatively few tables like in this case.
The alternate approach would be to "inherit" the Table 1, 2 and 3 from a common "parent" table, then connect the comments to the parent.
Look here and here for more info.
Does a generic foreign key exist?
If you mean a FK that can "jump" from table to table, then no.
Assuming all 3 FKs are of the same type1, you could theoretically implement something similar by keeping both foreign key value and referenced table name2 and then enforcing it through a trigger, but declarative constraints should be preferred over that, even at a price of slightly more storage space.
If your DBMS fully supports "virtual" or "calculated" columns, then you could do something similar to above, but instead of having a trigger, generate 3 calculated columns based on FK value and table name. Only one of these calculated columns would be non-NULL at any given time and you could use "normal" FKs for them as you would for the physical columns.
But, all that would make sense when there are many "connectable" tables and your DBMS is not thrifty in storing NULLs. There is very little to gain when there are just 3 of them or even when there are many more than that but your DBMS spends only one bit on each NULL field.
Or should I have a separate comments table for each main table, even though the data structure would be exactly the same?
The "data structure" is not the only thing that matters. If you happen to have different constraints (e.g. a FK that applies to one of them but not the other), that would warrant separate tables even though the columns are the same.
But, I'm guessing this is not the case here.
Or would a mapping table for each one be a better solution?
I'm not exactly sure what you mean by "mapping table", but you could do something like this:
Unfortunately, that would allow a single comment to be connected to more than one table (or no table at all), and is in itself a complication over what you already have.
All said and done, your original solution is probably fine.
1 Or you are willing to store it as string and live with conversions, which you should be reluctant to do.
2 In practice, this would not really be a name (as in string) - it would be an integer (or enum if DBMS supports it) with one of the well-known predefined values identifying the table.
Thanks for all the help folks, i was able to formulate a solution with the help of a colleague of mine. Instead of multiple mapping tables i decided to just use one.
This mapping table holds a group of comments, so it has no primary key. And each group row links back to a comment. So you can have multiple of the same group id. one-many-one would be the relationship.

Foreign Key Referencing Multiple Tables

I have a column with a uniqueidentifier that can potentially reference one of four different tables. I have seen this done in two ways, but both seem like bad practice.
First, I've seen a single ObjectID column without explicitly declaring it as a foreign key to a specific table. Then you can just shove any uniqueidentifier you want in it. This means you could potentially insert IDs from tables that are not part of the 4 tables I wanted.
Second, because the data can come from four different tables, I've also seen people make 4 different foreign keys. And in doing so, the system relies on ONE AND ONLY ONE column having a non-NULL value.
What's a better approach to doing this? For example, records in my table could potentially reference Hospitals(ID), Clinics(ID), Schools(ID), or Universities(ID)... but ONLY those tables.
Thanks!
You might want to consider a Type/SubType data model. This is very much like class/subclasses in object oriented programming, but much more awkward to implement, and no RDBMS (that I am aware of) natively supports them. The general idea is:
You define a Type (Building), create a table for it, give it a primary key
You define two or more sub-types (here, Hospital, Clinic, School, University), create tables for each of them, make primary keys… but the primary keys are also foreign keys that reference the Building table
Your table with one “ObjectType” column can now be built with a foreign key onto the Building table. You’d have to join a few tables to determine what kind of building it is, but you’d have to do that anyway. That, or store redundant data.
You have noticed the problem with this model, right? What’s to keep a Building from having entries in in two or more of the subtype tables? Glad you asked:
Add a column, perhaps “BuildingType”, to Building, say char(1) with allowed values of {H, C, S, U} indicating (duh) type of building.
Build a unique constraint on BuildingID + BuildingType
Have the BulidingType column in the subtables. Put a check constraint on it so that it can only ever be set to the value (H for the Hospitals table, etc.) In theory, this could be a computed column; in practice, this won't work because of the following step:
Build the foreign key to relate the tables using both columns
Voila: Given a BUILDING row set with type H, an entry in the SCHOOL table (with type S) cannot be set to reference that Building
You will recall that I did say it was hard to implement.
In fact, the big question is: Is this worth doing? If it makes sense to implement the four (or more, as time passes) building types as type/subtype (further normalization advantages: one place for address and other attributes common to every building, with building-specific attributes stored in the subtables), it may well be worth the extra effort to build and maintain. If not, then you’re back to square one: a logical model that is hard to implement in the average modern-day RDBMS.
Let's start at the conceptual level. If we think of Hospitals, Clinics, Schools, and Universities as classes of subject matter entities, is there a superclass that generalizes all of them? There probably is. I'm not going to try to tell you what it is, because I don't understand your subject matter as well as you do. But I'm going to proceed as if we can call all of them "Institutions", and treat each of the four as subclasses of Institutions.
As other responders have noted, class/subclass extension and inheritance are not built into most relational database systems. But there is plenty of assistance, if you know the right buzzwords. What follows is intended to teach you the buzzwords, in database lingo. Here is a summary of the buzzwords coming: "ER Generalization", "ER Specialization", "Single Table Inheritance", "Class Table Inheritance", "Shared Primary Key".
Staying at the conceptual level, ER modeling is a good way of understanding the data at a conceptual level. In ER modeling, there is a concept, "ER Generalization", and a counterpart concept "ER Specialization" that parallel the thought process I just presented above as "superclass/subclass". ER Specialization tells you how to diagram subclasses, but it doesn't tell you how to implement them.
Next we move down from the conceptual level to the logical level. We express the data in terms of relations or, if you will, SQL tables. There are a couple of techniques for implementing subclasses. One is called "Single Table Inheritance". The other is called "Class Table Inheritance". In connection with Class table inheritance, there is another technique that goes by the name "Shared primary Key".
Going forward in your case with class table inheritance, we first design a table called "Institutions", with an Id field, a name field, and all of the fields that pertain to institutions, no matter which of the four kinds they are. Things like mailing address fields, for instance. Again, you understand your data better than I do, and you can find fields that are in all four of your existing tables. We populate the id field in the usual way.
Next we design four tables called "Hospitals", "Clinics", "Schools", and "Universities". These will contain an id field, plus all of the data fields that pertain only to that kind of institution. For instance, a hospital might have a "bed capacity". Again, you understand your data better than I do, and you can figure these out from the fields in your existing tables that didn't make it into the Institutions table.
This is where "shared primary key" comes in. When a new entry is made into "Institutions", we have to make a new parallel entry into one of four specialized subclass tables. But we don't use some sort of autonumber feature to populate the id field. Instead, we put a copy of the id field from the "Institutions" table into the id field of the subclass table.
This is a little work, but the benefits are well worth the effort. Shared primary key enforces the one-to-one nature of the relationship between subclass entries and superclass entries. It makes joining superclass data and subclass data simple, easy, and fast. It eliminates the need for a special field to tell you which subclass a given institution belongs in.
And, in your case, it provides a handy answer to your original question. The foreign key you were originally asking about is now always a foreign key to the Institutions table. And, because of the magic of shared-primary-key, the foreign key also references the entry in the appropriate subclass table, with no extra work.
You can create four views that combine institution data with each of the four subclass tables, for convenience.
Look up "ER Specialization", "Class Table Inheritance", "Shared Primary Key", and maybe "Single Table Inheritance" on the web, and here in SO. There are tags for most of these concepts or techniques here in SO.
You could put a trigger on the table and enforce the referential integrity there. I don't think there's a really good out-of-the-box feature to implement this requirement.

Resources