One of my team's big tasks this year is to overhaul and normalize our legacy database structure. Many columns used by the application as foreign keys don't actually have a foreign key relationship defined in the database, and we are in the process of fixing that.
I'm wondering if there is a nice, succinct way of describing such a situation where there's a column that's clearly supposed to be a FK but is not defined in the database. I've been using "unofficial foreign key" or "loose foreign key" but usually those terms aren't clear enough and I have to explain what I mean (which is of course doable but takes a little time).
I haven't found any answers on Google, most results just describe why defining a FK is better. I'm also relatively new to DB design so perhaps I'm just thinking about this the wrong way.
Is there such a term?
"Fake" might get the point across more clearly, or, since foreign keys are a type of constraint, calling it "unenforced" would be more idiomatic.
how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?
It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.
I'm currently revising for an upcoming Database Management exam.
There is a question on denormalising the sample employees database. The question is as follows, the schema is also shown below:
Question:
Use the 'employees' database to de-normalise any two (or more) tables to produce a table in 2NF. you must precisely explain why the table is in 2NF and not 3NF.
Schema:
My Answer:
I would denormalise the 'salaries' table into the 'employees' table to. To normalise I would move {salary, from_date, to_date} into the employees table and remove the 'salaries' table. Note: from_date is no longer part of the primary key in 'employees'.
The 'employees' table is no longer in 3NF and is now in 2NF. This is because a 'transitive dependency' has been introduced into the table.
The transitive dependency is as follows: 'salary' depends on 'from_date'. It is transitive and not partial because 'from_date' is not a component of the primary key. In a partial dependency the determinant must be part of the primary key.
Basically for this question I need to create a transitive dependency. This schema seems a little sparse and I'm also a little thrown off by the fact that dates are part of primary keys.
If the above dependency is wrong could somebody possible point one out for me please?
Another possible solution is to denormalise 'departments' into 'dept_emp'. I could add the 'dept_name' into 'dept_emp'. But from looking at the SQL for this table shows 'dept_no' is part of the primary key.
Any guidance on this would be greatly appreciated.
I've got a database that doesn't have any foreign keys. I've done some checks and there are a a fair few orphaned records.
Its a pretty large database 500 + tables and I'm looking at the possibility of building the foreign keys back in.
Other than trawling though every single table over time?
Has anybody ever been through this process before and can maybe offer some insights or tips on how to make the process a little easier.
Any help advice appreciated.
I assume you mean "doesn't have any foreign key constraints"...if there were no foreign keys, you wouldn't know which records matched at all.
Do the primary and foreign key fields have the same name? As in, the PK table has a "CustomerId" field and the FK table(s) also have a "CustomerId" field? If so, you might be able to query the column properties (perhaps using INFORMATION_SCHEMA, you didn't mention an RDBMS) to figure out some implied relationships. Just query for all the tables that have a field called "CustomerId" that is not a PK and there's a good (but not certain) bet that those tables should have an FK constraint to the Customer table. You could even use the output of the query to generate the DDL to create the constraints.
You can work from the largest to smallest tables, or start with the least performant area of the database. Adding keys should help your performance significantly, but you'll have to resolve the orphan rows first. You may need input from the business for that. Expect them to be very confused about what's going on.
Should I always have a primary key in my database tables?
Let's take the SO tagging. You can see the tag in any revision, its likely to be in a tag_rev table with the postID and revision number. Would I need a PK for that?
Also since it is in a rev table and not currently use the tags should be a blob of tagIDs instead of multiple entries of multiple post_id tagid pair?
A table should have a primary key so that you could identify each row uniquely with it.
Technically, you can have tables without a primary key, but you'll be breaking good database design rules.
You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.
I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.
With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.
I tend to agree that most tables should have a primary key. I can only think of two times where it doesn't make sense to do it.
If you have a table that relates keys to other keys. For example, to relate a user_id to an answer_id, that table wouldn't need a primary key.
A logging table, whose only real purpose is to create an audit trail.
Basically, if you are writing a table that may ever need to be referenced in a foreign key relationship then a primary key is important, and if you can't be positive it won't be, then just add the PK. :)
See this related question about whether an integer primary key is required. One of the answers uses tagging as an example:
Are there any good reasons to have a database table without an integer primary key
For more discussion of tagging and keys, see this question:
Id for tags in tag systems
From MySQL 5.5 Reference Manual section 13.1.17:
If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.
So, technically, the answer is no. However, as others have stated, in most cases it is quite useful.
I firmly believe every table should have a way to uniquely identify a record. For 99% of the tables, this is a primary key. For the rest you may get away with a unique index (I'm thinking one column look up type tables here). Any time I have a had to work with a table without a way to uniquely identify records, there has been trouble.
I also believe if you are using surrogate keys as your PK, you should, where at all possible, have a separate unique index on whatever combination of fields make up the natural key. I realize there are all too many times when you don't have a true natural key (names are not unique or what makes something unique might be spread across several parentchild tables), but if you do have one, please please please make sure it has a unique index or is created as the PK.
If there is no PK, how will you update or delete a single row ? It would be impossible ! To be honest I have used a few times tables without PK, for instance to store activity logs, but even in this case it is advisable to have one because the timestamps could not be granular enough. Temporary tables is another example. But according to relational theory the PK is mandatory.
it is good to have keys and relationships . Helps a lot. however if your app is good enough to handle the relationships then you could possibly skip the keys ( although i recommend that you have them )
Since I use Subsonic, I always create a primary key for all of my tables. Many DB Abstraction libraries require a primary key to work.
Note: that doesn't answer the "Grand Unified Theory" tone of your question, but I'm just saying that in practice, sometimes you MUST make a primary key for every table.
If it's a join table then I wouldn't say that you need a primary key. Suppose, for example, that you have tables PERSONS, SICKPEOPLE, and ILLNESSES. The ILLNESSES table has things like flu, cold, etc., each with a primary key. PERSONS has the usual stuff about people, each also with a primary key. The SICKPEOPLE table only has people in it who are sick, and it has two columns, PERSONID and ILLNESSID, foreign keys back to their respective tables, and no primary key. The PERSONS and ILLNESSES tables contain entities and entities get primary keys. The entries in the SICKPEOPLE table aren't entities and don't get primary keys.
Databases don't have keys, per se, but their constituent tables might. I assume you mean that, but just in case...
Anyway, tables with a large number of rows should absolutely have primary keys; tables with only a few rows don't need them, necessarily, though they don't hurt. It depends upon the usage and the size of the table. Purists will put primary keys in every table. This is not wrong; and neither is omitting PKs in small tables.
Edited to add a link to my blog entry on this question, in which I discuss a case in which database administration staff did not consider it necessary to include a primary key in a particular table. I think this illustrates my point adequately.
Cyberherbalist's Blog Post on Primary Keys