I recently started looking into Database design. I have worked with Oracle, but would now like to create logical or conceptual design relationships first, before I implement them into the database. Learning the basics you could say.
I would like to create a database for cars. I have some tables, but am having trouble with the relationships, and when to use a foreign key/extra table.
I have created a car table, and added attributes. Now it is very clear to me to use a manufacturer foreign key in the car table referencing the manufacturer table.
But for example I would like to show what type(SUV, sedan, etc.) the car is. Furthermore I would like to show What class(normal, Upperclass, etc.) the car is. Since I will only differentiate between a maximum of 5 car types, do I still need to add a foreign key? Same goes for the Class Situation as well.
I have heard to always use a foreign key, because it safeguards the integrity of the database, but at University my teacher always told us to use the Minimum amount of tables as possible, therefore putting me in an awkward spot.
What should I do?
I would greatly appreciate clarification in this matter.
Even in toy systems, you should use foreign keys and extra tables; learning this habit will serve you well when the database gets larger, or when you start designing real databases.
I imagine that the remark about "minimum amount of tables" (should be "minimum number") is to prevent you creating tables containing two values (e.g. "yes" and "no", "present" and "not present"), again a good habit to learn. It could also refer to queries, where you do not want to join unnecessary tables; this is liable to lead to the query retrieving duplicate rows.
Car type and car class are examples of what is called an enumerated domain. A domain is the set of values that an attribute may take on. In Oracle terms it's the set of values that may be placed in a given column. Oracle, unlike some other SQL dialects, does not have a CREATE DOMAIN statement. You may be able to use CREATE TYPE instead but I can't tell you how.
The simplest solution for you may be to just create a lookup table, as you suggest. Don't worry too much about added complexity here.
It is easier to manage 100 tables than 100,000 lines of code. That's an old slogan from the dba world, but it has some truth to it.
Related
I have a small question concerning with how I should design my database. I have a table dogs for an animal shelter and I have a table owners. In the table dogs all dogs that are and once were in the shelter are being put. Now I want to make a relation between the table dogs and the table owners.
The problem is, in this example not all dogs have an owner, and since an owner can have more than one dog, a possible foreign key should be put in the table dogs (a dog can't have more than one owner, at least not in the administration of the shelter). But if I do that, some dogs (the ones in the shelter) will have null as a foreign key. Reading some other topics taught me that that is allowed. (Or I might have read some wrong topics)
However, another possibility is putting a table in between the two tables - 'dogswithowners' for example - and put the primary key of both tables in there if a dog has an owner.
Now my question is (as you might have guessed) what the best method is of these two and why?
The only solution that is in keeping with the principles of the Relational Model is the extra table.
Moreover, it's hard to imagine how you are going to find any hardware that is so slow that the difference in performance when you start querying, is going to be noticeable. After all, it's not a mission-critical tens-of-thousands-of-transactions-per-second appliation, is it ?
I agree with Philip and Erwin that the soundest and most flexible design is to create a new table.
One further issue with the null-based approach is that different software products disagree over how SQL's nullable foreign keys work. Even many IT professionals don't understand them properly so the general user is even less likely to understand it.
The nullable foreign key is a typical solution.
The most straightforward one is just to have another table of owners and dogs, with foreign keys to the owner and dog tables with the dog column UNIQUE NOT NULL. Then if you only want owners or owned dogs you do not have to involve IS NOT NULL in your queries and the DBMS does not need to access them among all owners and dogs. NULLs can simplify certain situations like this one but they also complicate compared to having a separate table and just joining when you want that data.
However, if it could become possible for a dog to have multiple owners then you might need the extra table anyway as many:many relationship without the UNIQUE NOT NULL column and the column pair owner-dog UNIQUE NOT NULL instead. You can always start with the one UNIQUE NOT NULL and move to the other if things change.
In the olden days of newsgroups, we had this guy called -CELKO- who would pop up and say, "There is a design rule of thumb that says a relational table should model either an entity or a relationship between entities but never both." Not terribly formal but it is a good rule of thumb in my opinion.
Is 'owner' (person) really an attribute of a dog? It seems to me more like you want to model the relationship 'ownership' between a person and a dog.
Another useful rule of thumb is to avoid SQL nulls! Three-valued logic is confusing to most users and programmers, null behavior is inconsistent throughout the SQL Standard and (as sqlvogel points out) SQL DBMS vendors implementation things in different ways. The best way of modelling missing data is by the omission of tuple in a relvar (a.k.a. don't insert anything into your table!). For example, Fido is included in Dog but omitted from DogOwnership then according to the Closed World Assumption Fido sadly has no owner.
All this points to having two tables and no nullable columns.
I wouldn't do any extra table. If for some reason no nulls allowed (it's a good question why) - I would, and I know some solutions do the same, put instead of null some value, that can't be a real key. e.g NOT_SET or so.
hope it helps
A nullable column used for foreign key relationship is perfectly valid and used for scenarios exactly like yours.
Adding another table to connect the owners table with the dogs table will create a many to many relationship, unless a unique constraint is created on one of it's columns (dogs in your case).
Since you describe a one to many relationship, I would go with the first option, meaning having a nullable foreign key, since I find it more readable.
I'm not a DB design expert and have what I suspect is a newbie question. If it's better answered in another forum (or from a simple reference), please let me know.
Given a Table "Recordings" and a table "Artists". Both tables have primary keys suitably defined. There is a relationship that we want to express between these tables. Namely, An artist could have many recordings, or no recordings. A recording can only have 1 or 0 artists. (We could have some obscure recording with no known artist).
I thought the solution to this problem was to have a foreign key pointing to artist in the Recording Table. This field could be null (the recording has no artist). Additionally, we should define cascading deletes, such that if an artist is deleted, all recordings that have a foreign referring to that artist, now have a foreign key of null. [I really do want to leave the actual recording when you delete the artist. Our real tables are not "artists" and "recordings" and a recording can exist without an artist].
However, this is not how my colleagues have set things ups. There is no foreign key column in 'Recordings', but rather an extra table 'RecordingArtist_Mapping' with two columns,
RecordingKey ArtistKey
If an Artist (or Recording) is removed, the corresponding entry in this mapping table is removed. I'm not saying this is wrong, just different to what I expected. I have certainly seen a table like this when one has a many-many relationship, but not the relationship I described above.
So, my questions are:
Have you heard of this way of describing the relationship?
Is there a name for this type of table?
Is this a good way to model the relationship or would be be better off with the foreign key idea I explained? What are the pros/cons of each?
My colleagues pointed out that with the foreign key idea, you could have a lot of nulls in the Recordings Table, and that this violates (perhaps just in spirit?) one of the Five Normal Forms in Relational Database Theory. I'm way out of my league on this one :) Does it violate one of these forms? Which one? How? (bonus points for simple reference to "Five Normal Forms" :) ).
Thank you for your help and guidance.
Dave
On the face of it, this it simply an intersection table that allows a many-to-many relationship between two other tables.
When you find that you need one of these it is generally a good idea to consider "what does this table mean", and "have I included all the relevant attributes".
In this case the table tells you that the artist contributed to the recording in some way, and you might then consider "what was the nature of the contribution".
Possibly that they played a particular instrument, or instruments. Possibly they were a conductor.
You might then consider whether people other than artists made a contribution to the recording -- sound engineer? So that leads you to consider whether "artist" is a good table at all, because you might instead want a table that represents people in general, and then you can relate any of them to a recording. Maybe you even want to record the contribution of a non-person -- the London Symphony Orchestra, for example.
You can even have entities that contribute in multiple ways -- guitarist, vocalist, and producer? You might also consider whether there ought to be a ranking of the contributions so that they are listed in the correct order (which may be a contractual issue).
This is exactly the way that contributions to written works are generally modelled -- here is a list of the contributor codes used in the ONIX metadata schema for books, as an illustrative industry example: https://www.medra.org/stdoc/onix-codelist-17.htm
Your solution with a foreign key in Recording is absolutely correct from the Normalization Theory point of view, it does not violate any significant normal form (the most important one are Third Normal Form, and Boyce-Codd Normal Form, and neither of them is violated).
Moreover, a part being conceptually simpler and safe, from a practical point of view it is more efficient, since it in general reduces the number of joins that must be done. In may opinion, the pros are greater than the cons.
Yes, that's a viable setup, this is called vertical partitioning.
Basically, you move your artist field from recording to another table with the primary key referencing that on recording.
The benefit is you don't necessarily have to retrieve artists with doing lookups on recordings, the drawback is that if you still have to, if would be somewhat slower, because of an extra join.
Have you heard of this way of describing the relationship?
Yes, it's a many to many relationship. A recording can have more than one artist. An artist can have more than one recording.
Is there a name for this type of table?
I call them junction tables.
Is this a good way to model the relationship or would be be better off with the foreign key idea I explained? What are the pros/cons of each?
A junction table is required in a many to many relationship. When you have a one to many relationship, you would use a foreign key in the many table.
As far as 4th level and 5th level database normalization, this A Simple Guide to Five Normal Forms in Relational Database Theory article from 1982 explains the different levels.
Under fourth normal form, a record type should not contain two or more independent multi-valued facts about an entity.
Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy.
I remember the first 3 levels of normalization with this sentence.
I solemnly swear that the columns rely on the key, the whole key, and nothing but the key, so help me Codd.
Suppose that we have a "Cash Transactions" table, as its name implies it keeps the track of cash I/O. There might be a case in the future where we are having cash transactions about completely different concepts. Since we model these "concepts" in the database, we would like to have some form of identifiability between transactions and the concepts. In other words, I would like to know from which table and which entry a money transaction comes from.
I've come up with two solutions; first one involving a meta-data column identifying the table and a foreign key, second one with foreign keys as many as it needs and only using the non-null one so we know by the merit of the column name which table to look for it.
I reckon they both will work but they feel hacky. It feels like there is an elegant solution but its not these two. Or perhaps I hit the limit of relational DB design and I should resolve to NoSQL? How to do this properly?
You should use a link table, the cash transactions should be unaware of what table to link to.
I'm really new to database design, as I will now demonstrate:
I have an MS Sql database that I need to add a table to. The table contains information that pertains to another table. However, there are no candidates for primary keys (all fields can be duplicates). The only thing the table will ever be used for is to keep records that may be required for a certain kind of query, and they can be retrieved super-easily using a field that my other tables also contain (but never uniquely).
Specifically, my main table has a bunch of chemistry records. Each chemistry record is associated with another set of records called quality-control records (in my second table). They are associated by a field called "BatchID". The super-easy part is that I can say, "get all records with this BatchID" and get exactly what I need. But there can be multiple instances of any BatchID in both tables (in fact, there usually are), so I'd need to jump through hoops to link them. In a more general sense, in theory, is it OK to have a table floating around not attached to anything?
The overwhelmingly simple solution is to just put the quality control in the db with no relationships to the chemistry table. I'd need to insert at least one other table to relate it to anything else, maybe more, and the only reason for complicating my life like that is that I don't want to violate some important precept of database design.
My question is, is it ever OK to just have a free-floating table in a database? Or is that right out?
Thanks for any help.
In theory, it's ok to have a table that doesn't have any foreign key constraints. But the table you describe (both tables you describe) should probably have a foreign key that references the table of batches. We'd expect the table of batches to have "BatchID" as its primary key.
The relational model requires tables to have at least one candidate key. It's almost always a bad idea to have a SQL table that doesn't have a candidate key.
I have a column with a uniqueidentifier that can potentially reference one of four different tables. I have seen this done in two ways, but both seem like bad practice.
First, I've seen a single ObjectID column without explicitly declaring it as a foreign key to a specific table. Then you can just shove any uniqueidentifier you want in it. This means you could potentially insert IDs from tables that are not part of the 4 tables I wanted.
Second, because the data can come from four different tables, I've also seen people make 4 different foreign keys. And in doing so, the system relies on ONE AND ONLY ONE column having a non-NULL value.
What's a better approach to doing this? For example, records in my table could potentially reference Hospitals(ID), Clinics(ID), Schools(ID), or Universities(ID)... but ONLY those tables.
Thanks!
You might want to consider a Type/SubType data model. This is very much like class/subclasses in object oriented programming, but much more awkward to implement, and no RDBMS (that I am aware of) natively supports them. The general idea is:
You define a Type (Building), create a table for it, give it a primary key
You define two or more sub-types (here, Hospital, Clinic, School, University), create tables for each of them, make primary keys… but the primary keys are also foreign keys that reference the Building table
Your table with one “ObjectType” column can now be built with a foreign key onto the Building table. You’d have to join a few tables to determine what kind of building it is, but you’d have to do that anyway. That, or store redundant data.
You have noticed the problem with this model, right? What’s to keep a Building from having entries in in two or more of the subtype tables? Glad you asked:
Add a column, perhaps “BuildingType”, to Building, say char(1) with allowed values of {H, C, S, U} indicating (duh) type of building.
Build a unique constraint on BuildingID + BuildingType
Have the BulidingType column in the subtables. Put a check constraint on it so that it can only ever be set to the value (H for the Hospitals table, etc.) In theory, this could be a computed column; in practice, this won't work because of the following step:
Build the foreign key to relate the tables using both columns
Voila: Given a BUILDING row set with type H, an entry in the SCHOOL table (with type S) cannot be set to reference that Building
You will recall that I did say it was hard to implement.
In fact, the big question is: Is this worth doing? If it makes sense to implement the four (or more, as time passes) building types as type/subtype (further normalization advantages: one place for address and other attributes common to every building, with building-specific attributes stored in the subtables), it may well be worth the extra effort to build and maintain. If not, then you’re back to square one: a logical model that is hard to implement in the average modern-day RDBMS.
Let's start at the conceptual level. If we think of Hospitals, Clinics, Schools, and Universities as classes of subject matter entities, is there a superclass that generalizes all of them? There probably is. I'm not going to try to tell you what it is, because I don't understand your subject matter as well as you do. But I'm going to proceed as if we can call all of them "Institutions", and treat each of the four as subclasses of Institutions.
As other responders have noted, class/subclass extension and inheritance are not built into most relational database systems. But there is plenty of assistance, if you know the right buzzwords. What follows is intended to teach you the buzzwords, in database lingo. Here is a summary of the buzzwords coming: "ER Generalization", "ER Specialization", "Single Table Inheritance", "Class Table Inheritance", "Shared Primary Key".
Staying at the conceptual level, ER modeling is a good way of understanding the data at a conceptual level. In ER modeling, there is a concept, "ER Generalization", and a counterpart concept "ER Specialization" that parallel the thought process I just presented above as "superclass/subclass". ER Specialization tells you how to diagram subclasses, but it doesn't tell you how to implement them.
Next we move down from the conceptual level to the logical level. We express the data in terms of relations or, if you will, SQL tables. There are a couple of techniques for implementing subclasses. One is called "Single Table Inheritance". The other is called "Class Table Inheritance". In connection with Class table inheritance, there is another technique that goes by the name "Shared primary Key".
Going forward in your case with class table inheritance, we first design a table called "Institutions", with an Id field, a name field, and all of the fields that pertain to institutions, no matter which of the four kinds they are. Things like mailing address fields, for instance. Again, you understand your data better than I do, and you can find fields that are in all four of your existing tables. We populate the id field in the usual way.
Next we design four tables called "Hospitals", "Clinics", "Schools", and "Universities". These will contain an id field, plus all of the data fields that pertain only to that kind of institution. For instance, a hospital might have a "bed capacity". Again, you understand your data better than I do, and you can figure these out from the fields in your existing tables that didn't make it into the Institutions table.
This is where "shared primary key" comes in. When a new entry is made into "Institutions", we have to make a new parallel entry into one of four specialized subclass tables. But we don't use some sort of autonumber feature to populate the id field. Instead, we put a copy of the id field from the "Institutions" table into the id field of the subclass table.
This is a little work, but the benefits are well worth the effort. Shared primary key enforces the one-to-one nature of the relationship between subclass entries and superclass entries. It makes joining superclass data and subclass data simple, easy, and fast. It eliminates the need for a special field to tell you which subclass a given institution belongs in.
And, in your case, it provides a handy answer to your original question. The foreign key you were originally asking about is now always a foreign key to the Institutions table. And, because of the magic of shared-primary-key, the foreign key also references the entry in the appropriate subclass table, with no extra work.
You can create four views that combine institution data with each of the four subclass tables, for convenience.
Look up "ER Specialization", "Class Table Inheritance", "Shared Primary Key", and maybe "Single Table Inheritance" on the web, and here in SO. There are tags for most of these concepts or techniques here in SO.
You could put a trigger on the table and enforce the referential integrity there. I don't think there's a really good out-of-the-box feature to implement this requirement.