SQL Server - database design - one to many OR many to many - sql-server

I'm wanting advice as to the best way to design my database - it is storing medical data. I have a number of different entities (tables) that may have one or more medications associated with them. These are always 1 to many relationships, and the medications are only ever related to a single entity (ie. they are not shared). The columns for the Medication data are common.
My question is, should I have a single Medication table (and use numerous many-to-many mapping tables) OR should I use multiple Medication tables?
Option 1 - single Medication table:
[table1]1---*[table1_has_medication]*---1[medication]
[table2]1---*[table2_has_medication]*---1[medication]
[table3]1---*[table3_has_medication]*---1[medication]
Option 2 - multiple Medication tables:
[table1]1---*[table1Medication]
[table2]1---*[table2Medication]
[table3]1---*[table3Medication]
Option 1 seems neater as all Medication data is in a single table. However, a Medication is in fact only ever related to a single table so it's not a true many-to-many relationship. Also, I assume I can't support cascaded deletes for many-to-many relationships so I need to be careful of "orphaned" Medication records.
I'm interested in the opinions of experienced database designers. Thank you.

In addition to not representing your requirements accurately, a single many-to-many (aka. "junction" or "link") table has another problem: one FK can only reference one table, so either you'll have to use multiple exclusive FKs, or you'll have to enforce referential integrity yourself, which is harder to do properly than it looks.
All in all, looks like separate medication tables are what you need.
NOTE: That could potentially become a problem if your requirements evolve and you suddenly have to reference all medications from another table. If that happens, consider "inheriting" all medication tables from the common table. Here is an example you can extrapolate from.

Found a suitable answer on DBA stackexchange.
Repeated below:
Relational databases are not built to handle this situation perfectly. You have to decide what is most important to you and then make your trade-offs. You have several goals:
Maintain third normal form
Maintain referential integrity
Maintain the constraint that each account belongs to either a corporation or a natural person.
Preserve the ability to retrieve data simply and directly
The problem is that some of these goals compete with one another.
Sub-Typing Solution
You could choose a sub-typing solution where you create a super-type that incorporates both corporations and persons. This super-type would probably have a compound key of the natural key of the sub-type plus a partitioning attribute (e.g. customer_type). This is fine as far as normalization goes and it allows you to enforce referential integrity as well as the constraint that corporations and persons are mutually exclusive. The problem is that this makes data retrieval more difficult, because you always have to branch based on customer_type when you join account to the account holder. This probably means using UNION and having a lot of repetitive SQL in your query.
Two Foreign Keys Solution
You could choose a solution where you keep two foreign keys in your account table, one to corporation and one to person. This solution also allows you to maintain referential integrity, normalization and mutual exclusivity. It also has the same data retrieval drawback as the sub-typing solution. In fact, this solution is just like the sub-typing solution except that you get to the problem of branching your joining logic "sooner".
Nevertheless, a lot of data modellers would consider this solution inferior to the sub-typing solution because of the way that the mutual exclusivity constraint is enforced. In the sub-typing solution you use keys to enforce the mutual exclusivity. In the two foreign key solution you use a CHECK constraint. I know some people who have an unjustified bias against check constraints. These people would prefer the solution that keeps the constraints in the keys.
"Denormalized" Partitioning Attribute Solution
There is another option where you keep a single foreign key column on the chequing account table and use another column to tell you how to interpret the foreign key column (RoKa's OwnerTypeID column). This essentially eliminates the super-type table in the sub-typing solution by denormalizing the partitioning attribute to the child table. (Note that this is not strictly "denormalization" according to the formal definition, because the partitioning attribute is part of a primary key.) This solution seems quite simple since it avoids having an extra table to do more or less the same thing and it cuts the number of foreign key columns down to one. The problem with this solution is that it doesn't avoid the branching of retrieval logic and what's more, it doesn't allow you to maintain declarative referential integrity. SQL databases don't have the ability to manage a single foreign key column being for one of multiple parent tables.
Shared Primary Key Domain Solution
One way that people sometimes deal with this issue is to use a single pool of IDs so that there is no confusion for any given ID whether it belongs to one sub-type or another. This would probably work pretty naturally in a banking scenario, since you aren't going to issue the same bank account number to both a corporation and a natural person. This has the advantage of avoiding the need for a partitioning attribute. You could do this with or without a super-type table. Using a super-type table allows you to use declarative constraints to enforce uniqueness. Otherwise this would have to be enforced procedurally. This solution is normalized but it won't allow you to maintain declarative referential integrity unless you keep the super-type table. It still does nothing to avoid complex retrieval logic.
You can see therefore that it isn't really possible to have a clean design that follows all of the rules, while at the same time keeping your data retrieval simple. You have to decide where your trade-offs are going to be.

Related

What is the advantages in using foreign keys and disadvantages in not using? [duplicate]

I remember hearing Joel Spolsky mention in podcast 014 that he'd barely ever used a foreign key (if I remember correctly). However, to me they seem pretty vital to avoid duplication and subsequent data integrity problems throughout your database.
Do people have some solid reasons as to why (to avoid a discussion in lines with Stack Overflow principles)?
Edit: "I've yet to have a reason to create a foreign key, so this might be my first reason to actually set up one."
Reasons to use Foreign Keys:
you won't get Orphaned Rows
you can get nice "on delete cascade" behavior, automatically cleaning up tables
knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented
Reasons not to use Foreign Keys:
you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
you are just being lazy :-)
I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.
This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.
If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.
Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)
I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.
If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)
Foreign keys are essential to any relational database model.
I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.
There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.
If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?
Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."
Foreign keys complicate automated testing
Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.
However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.
Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.
There are at least two possible perspectives on this:
"That's a good thing: your test should be realistic, and those data constraints will exist in production."
"That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."
It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.
"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."
It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated.
The ones I know of are:
ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.
These same actions also apply to ON UPDATE.
The default seems to depend on which sql server you're using.
#imphasing - this is exactly the kind of mindset that causes maintenance nightmares.
Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.
There's one good reason not to use them: If you don't understand their role or how to use them.
In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.
Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.
There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.
"Before adding a record, check that a corresponding record exists in another table" is business logic.
Here are some reasons you don't want this in the database:
If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).
Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).
TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.
Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.
From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.
Its better to avoid FKs but its risk has to be handled by programmers.
Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.
Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.
To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.
In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.
QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.
Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.
I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.
In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.
The Clarify database is an example of a commercial database that has no primary or foreign keys.
http://www.geekinterview.com/question_details/18869
The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.
In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.
Consequently, the Clarify database is full of inconsistencies and it underperforms.
But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.
And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.
If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.
But here is another very good real-life reason not to use foreign keys at all:
You are developing a product, which should support different database systems.
If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).
This can lead to different problems:
1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.
2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.
You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.
Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.
They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.
If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.
I believe the advantages in using fireign keys outweighs any supposed disadvantages.
Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.
Additional Reason to use Foreign Keys:
- Allows greater reuse of a database
Additional Reason to NOT use Foreign Keys:
- You are trying to lock-in a customer into your tool by reducing reuse.
I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !
Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.
Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.
Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.
The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.
EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.
To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.
I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.
I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.
How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.
I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.
For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.
You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.
If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.
So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.
I echo the answer by Dmitriy - very well put.
For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.
I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.
Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.
Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.
There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.
I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...
One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)
I'll echo what Dmitriy said, but adding on a point.
I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.
Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.
PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.

How do I properly design a database? Foreign Keys vs Secondary Keys?

here are some generic tables, I am trying to fully understand how to properly setup databases tables. Are these setup correctly? I want to be able to lookup a user's Items and Item Details as fast as possible. FYI for this example ItemDetailsX do not share the same data fields.
I am a little bit stuck on Foreign Keys and Secondary keys. When do you use a Secondary Key vs a Foreign Key?
tbl_Users 1:* tbl_Item //relationship
tbl_Item 1:1 tbl_ItemDetail1 & tbl_ItemDetail2 // relationship
tbl_Item 1:N tbl_ItemDetail3 //releationship
tbl_Users
-UserID - PK
tbl_Item
-ItemID - PK
-UserID - FK
tbl_ItemDetail1
-ItemDetail1ID - PK //Do I even need this if I have ItemID? Its a 1:1 relationship with
-ItemID - FK
-Count
-Duration
-Frequency
tbl_ItemDetail2
-ItemDetail2ID - PK //Do I even need this if I have ItemID? Its a 1:1 relationship with
-ItemID - FK
-OnOff
-Temperature
-Voltage
tbl_ItemDetail3
-ItemDetail3ID - PK //Has a 1:N relationship
-ItemID - FK
-Contrived Value1
-Contrived Valu2
EDIT:
Thanks for the replies, I have updated my original post to properly reflect my database.
In the database that I am creating, the Item has ~9 item details. Each item details is 5-15 columns of data.
Having 1 table with like 100 columns does not make sense...?
Databases enforce 3 kinds of declarative integrity:
Integrity of domain - field's type and CHECK constraint.
Integrity of key - PRIMARY KEY or UNIQUE constraint.
Referential integrity - FOREIGN KEY.
A key uniquely identifies rows in the table. All keys are logically equivalent, but for practical reasons one of them is chosen as "primary" and the rest are considered "alternate" (there are some complications involving NULLs, but let's not get into that here).
On the other hand, a FOREIGN KEY is as a kind of "pointer" from one table to another, where the DBMS itself guarantees this "pointer" can never "dangle". The foreign key references the (primary or alternate) key in "parent" table, but the "child" endpoint does not need to be a key itself (and usually isn't).
When a row is modified or deleted from the parent table, this change is either cascaded to the child table (ON [UPDATE/DELETE] [CASCADE/SET NULL/SET DEFAULT]) or the whole operation is blocked (ON [UPDATE/DELETE] RESTRICT).
If a child is inserted or modified, it is checked against the parent table to make sure this new value exists there.
The constraints change the meaning of data. Indexes, on the other hand, do not change the meaning of data - they are here purely for performance reasons. Some databases will even allow you to have a key without an underlying index, although this is usually a bad idea performance-wise. An index underneath the primary key is called "primary index" and all other indexes are "secondary".
BTW, there is "secondary index" and there is "alternate key", but there is no such thing as "secondary key".
I'm not quite sure what is your design goal, but I'm guessing something like this would be a decent starting point:
I see no purpose in extracting details to separate tables if they are always in 1:1 relationship with the item.
--- EDIT ---
Some questions you'll need to ask yourself before being able to arrive at optimal database design:
Is there a real 1:1 relationship between item and detail or is it actually 1:0..1 (i.e. some details are optional?).
If 1:1, just using columns is the most natural representation. BTW, a decent DBMS will have no trouble handling 100 columns.
If 1:0..1, you'll have to decide whether to use NULL-able columns, or separate tables. Just keep in mind that most DBMSes are really efficient in storing NULLs (typically just a small bitmap per row), so separating the data to a different table might not get you much, and in fact may substantially worsen the querying performance due increased need for JOINing.
Are all detail kinds predetermined (i.e. can you confidently say you won't need to add any new kinds of details later in the application's lifecycle)?
If yes, just use columns.
If no, adding columns on the large existing database can be expensive - whether it is expensive enough to warrant using separate table is up to you to measure.
You could also consider generalizing all the details as name/value pairs and representing them within a single 1:N table (not shown here). This is very flexible and "evolvable", but has its own set of problems.
How do you intend to query the data? This is a biggie and may influence substantially whether to go with "columns" or "separate table" approach, indexing etc...
BTW, the 1:0..1 with separate tables can be modeled like this...
...and 1:1 can be modeled like this...
...but this introduces circular dependency that must be handled in a special way (usually by deferring one of the FOREIGN KEYs).
1:N details, of course, are another matter and are naturally modeled through separate tables.
Since you say "detail 1" and "detail 2" are 1:(0..)1 and "detail 3" is 1:N, your "updated" data model would probably look something like this:
BTW, the above model uses identifying relationships which result in more "natural" keys. Non-identifying relationships / surrogate keys approach would look like this:
Each approach has its advantages, but this post is becoming a little long already ;) ...
Your question cannot be answered in one simple SO post. There are a lot of things to consider when creating a database. The best thing I ever did to learn about databases and how to create them was to read a book called "Database Design For Mere Mortals" written by Michael Hernandez.
See my post on Programmers to the question How do you approach database design?

Is it good practice to have foreign keys in a datawarehouse (relationships)?

I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.
I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.
Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.
So, most DWs do not have foreign keys implemented as constraints.
FK constraints work well in Kimball dimensional models on SQL Server.
Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).
Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.
It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.
Using FK-constraints in a DW is like wearing a bicycle helmet. If the ETL is designed correctly, you technically don't need them. That said, if I had a million dollars for every time I've seen bug-free ETL, I'd have zero dollars.
Until you're at a point where FK-constraints are causing performance issues, I say leave'em. Cleaning up referential integrity problems can be much harder than adding them from the get-go ;-)
The quesiton is clear, but "good practice" seems the wrong question.
"Could have FK's" ?
Foreign keys are a mechanism to preserve integrity constraints during database modifications.
If your DW is read-only (accumulating data sources without writing back), there is no need for FK's.
If your DW supports writes, integrity constaints typically need to be coordinated across the participating data sources by the ETL (rather, it's Store equivalent). This process may or may not rely on FK's in the database.
So the right question would be: do you need them.
(The only other reason I can think of would be documentation of relationship - however, this can be done on paper / in a separate document, too.)
I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-)
While foreign key constraints help data integrity, they have an associated cost on all insert, update and delete statements. Give careful attention to the use of constraints in your warehouse or ODS when you wish to ensure data integrity and validation
The reason for using a foreign key constraint in a data warehouse is the same as for any other database: to ensure data integrity.
It is also possible that query performance will benefit because foreign keys permit certain types of query rewrite that are not normally possible without them. Data integrity is still the main reason to use foreign keys however.
Yes, as a best practice, implement the FK constraints on your fact tables. In SQL Server, use NOCHECK. In ORACLE always use RELY DISABLE NOVALIDATE. This allows the warehouse or mart to know about the relationship, but not check it on INSERT, UPDATE, or DELETE operations. Star transformations, optimizations, etc. may not rely on the FK constraints to improve queries like they used to, but one never knows what BI or OLAP tools will be used on the front side or your warehouse or mart. Some of these tools can make use of knowing the relationships are defined. Plus, how many ugly looking warehouses have you seen with little or no external documentation and had to try to reverse engineer them? Defining the FKs always helps with that.
As designers we NEVER seem to make our data warehouses or marts as self-documenting as we should. Defining FKs certainly helps with that. Now, having said this, if star schemas are properly designed without FKs being defined, it is easy to read and understand them anyway.
And for ORACLE fact tables, always define a LOCAL BITMAP index on every FK to a dimension. Just do it. The indexing is actually more important than the FK being defined.
There is a very good reason to create FK constraints in even read-only DW/DM.
Yes, they are not really required from read-only DW itself point of view, if your ETL is bullet-proof, etc., etc. But guess what - the life doesn't stop at the loading data in DW. Most of the BI analytical/reporting tools are using information about your DW relationships to automatically build their model (for example SSAS Tabular model).
In my humble opinion this alone outweighs the little overhead on dropping and recreating FK constraints during ETL process.

Can a database table contains more than one primary key?

Can a database table contains more than one primary key?
Yes, I am talking about RDBMS.
A table can have:
No primary keys;
One primary key consisting of one column; or
One composite primary key consisting of two or more columns.
Other than that you can have any number of unique indexes, which will do basically the same thing.
The primary key of a relational table uniquely identifies each record in the table.
So, in order to keep the uniqueness of each record, you cant have more than one primary key for the table.
It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in combination.
That's why it is called Primary Key because it is, well, PRIMARY
Yes, you can have Composite primary keys, that is, having two fields as a primary key.
"First of all, you have to understand the history of entity-relationship design methodology as well as understand the word "relational" in relational database management systems (RDBMS)."
May I suggest politely that you first get YOURSELF educated on these very same subjects before leading other people into flawed beliefs ? I'll respond to the two worst ones of your stupidities below.
"According to relational methodology principles, each entity should only have one and only one means to identify it."
That is about the biggest crap I have ever heard anybody spawn around about relational data design. The relational model does not constrain any "entity", as you erroneously call it, to have any precise number of keys. Any "entity" can have any number of keys, and EACH key is, by definition of its very property of making the "rows" unique, a valid candidate for any purpose of "identification". Choosing the most useful/appropriate one for use in certain contexts (foreign keys in referencing tables, e.g.), is a design issue, and the relational model does not have anything to say on such things.
"Therefore, "R"DBMS attempts to facilitate the modeling of entity relationships."
Codd's paper "A Relational model of date for large shared data banks", which marks the birth of the relational model, predates the invention of E-R by a number of years. So to say that the Relational model attempts to facilitate the modeling of E-R concepts, is having things COMPLETELY backwards, and nothing but a display of one's own complete and utter ignorance of "the history" that you referred to in your own answer.
The short answer is yes. A primary key is a candidate key and is in principle no different to any other candidate key. It is a widely observed convention that one candidate key per table is designated as the "primary" one - meaning that it is "preferred" or has some special meaning for the database designer or user. This is just convention however. It is only a label of convenience and a reminder about the potential significance of one key. In practice all keys can serve the same purpose and the "primary" one is not special or unique in any fundamental way.
First of all, you have to understand the history of entity-relationship design methodology as well as understand the word "relational" in relational database management systems (RDBMS).
In order to define the bounds of an entity and relationships to be formed, there must be a unique handle or a unique combination of handles to identify each single instance of an entity and then to form relationships between them.
You also need to understand the meaning/root of the word "identify" which is to zero in on the "identity" of each instance of an entity. "identity" being the mathematical term meaning "one" or a singularity.
According to relational methodology principles, each entity should only have one and only one means to identify it. Therefore, "R"DBMS attempts to facilitate the modeling of entity relationships. Note the differences between "Entity/Class" and "Entity/Class instance".
However, RDBMS is used widely and mostly by people not so interested in accurately portraying the E-R design principles. So that frequently, we have more than one possible entity-definition sitting inside a table, which I call entity-aliasing. Opposed to identity-aliasing, where two or more instances of an entity-set hides under the same key, entity-aliasing is like the table
EmpProj([empId], empName, empAddr, projId, projLoc)
actually has two entity-sets aliased under the same table:
Emp([empId], empName, empAddr)
Proj([projId], projLoc, empId)
That is when normalisation comes in - to separate these entities out. Try as we might to do a decent design normalisation, computer scientists may not have as good a perspective on the information as a statistician. The computer scientist (which in this discussion includes everyone with a decent knowledge of ER design) tries his/her best in creating a schema that cleanly defines entities and their relationships.
However, after 18 months analysing voluminous information from the database, the statistician begin to see principal components that emerge whose analyses is terribly crippled due to the misalignment of the principal components with those of boundaries of the computer scientists' perceived entities.
That is where alternate unique keys are good for - to identify instances of entities due to the principal components existing as ghost-entities in the database.
Therefore, the primary key of a table is because that table is perceived to be a perfect entity as an entity should have only one primary key, be it singular or composite.
As far as the statistician is concerned, even though the database allows only one primary key per table, the alternative unique keys is to the statistician the primary keys to those ghost-entities. Which is why sometimes you are frustrated by statisticians who seem to do double work by downloading the data into the local database of their workstation/PC.
In conclusion, the constraint placed by the "R"DBMS manufacturer in allowing only one primary key per table is their pretense in believing that they know how information behave and believing that principal components of the information due to the population do not mutate over time.
If you have more than one unique keys possible in a table it means either one or more of the possibilities
Like myself, you are lazy to
separate them since they seem to
work quite well
For performance' sake, mixing the
entities into the same table makes
the application run incredibly
faster
Like the statistician, you gradually
discover ghost entities in your
information.

What's wrong with foreign keys?

I remember hearing Joel Spolsky mention in podcast 014 that he'd barely ever used a foreign key (if I remember correctly). However, to me they seem pretty vital to avoid duplication and subsequent data integrity problems throughout your database.
Do people have some solid reasons as to why (to avoid a discussion in lines with Stack Overflow principles)?
Edit: "I've yet to have a reason to create a foreign key, so this might be my first reason to actually set up one."
Reasons to use Foreign Keys:
you won't get Orphaned Rows
you can get nice "on delete cascade" behavior, automatically cleaning up tables
knowing about the relationships between tables in the database helps the Optimizer plan your queries for most efficient execution, since it is able to get better estimates on join cardinality.
FKs give a pretty big hint on what statistics are most important to collect on the database, which in turn leads to better performance
they enable all kinds of auto-generated support -- ORMs can generate themselves, visualization tools will be able to create nice schema layouts for you, etc.
someone new to the project will get into the flow of things faster since otherwise implicit relationships are explicitly documented
Reasons not to use Foreign Keys:
you are making the DB work extra on every CRUD operation because it has to check FK consistency. This can be a big cost if you have a lot of churn
by enforcing relationships, FKs specify an order in which you have to add/delete things, which can lead to refusal by the DB to do what you want. (Granted, in such cases, what you are trying to do is create an Orphaned Row, and that's not usually a good thing). This is especially painful when you are doing large batch updates, and you load up one table before another, with the second table creating consistent state (but should you be doing that sort of thing if there is a possibility that the second load fails and your database is now inconsistent?).
sometimes you know beforehand your data is going to be dirty, you accept that, and you want the DB to accept it
you are just being lazy :-)
I think (I am not certain!) that most established databases provide a way to specify a foreign key that is not enforced, and is simply a bit of metadata. Since non-enforcement wipes out every reason not to use FKs, you should probably go that route if any of the reasons in the second section apply.
This is an issue of upbringing. If somewhere in your educational or professional career you spent time feeding and caring for databases (or worked closely with talented folks who did), then the fundamental tenets of entities and relationships are well-ingrained in your thought process. Among those rudiments is how/when/why to specify keys in your database (primary, foreign and perhaps alternate). It's second nature.
If, however, you've not had such a thorough or positive experience in your past with RDBMS-related endeavors, then you've likely not been exposed to such information. Or perhaps your past includes immersion in an environment that was vociferously anti-database (e.g., "those DBAs are idiots - we few, we chosen few java/c# code slingers will save the day"), in which case you might be vehemently opposed to the arcane babblings of some dweeb telling you that FKs (and the constraints they can imply) really are important if you'd just listen.
Most everyone was taught when they were kids that brushing your teeth was important. Can you get by without it? Sure, but somewhere down the line you'll have less teeth available than you could have if you had brushed after every meal. If moms and dads were responsible enough to cover database design as well as oral hygiene, we wouldn't be having this conversation. :-)
I'm sure there are plenty of applications where you can get away with it, but it's not the best idea. You can't always count on your application to properly manage your database, and frankly managing the database should not be of very much concern to your application.
If you are using a relational database then it seems you ought to have some relationships defined in it. Unfortunately this attitude (you don't need foreign keys) seems to be embraced by a lot of application developers who would rather not be bothered with silly things like data integrity (but need to because their companies don't have dedicated database developers). Usually in databases put together by these types you are lucky just to have primary keys ;)
Foreign keys are essential to any relational database model.
I always use them, but then I make databases for financial systems. The database is the critical part of the application. If the data in a financial database isn't totally accurate then it really doesn't matter how much effort you put into your code/front-end design. You're just wasting your time.
There's also the fact that multiple systems generally need to interface directly with the database - from other systems that just read data out (Crystal Reports) to systems that insert data (not necessarily using an API I've designed; it may be written by a dull-witted manager who has just discovered VBScript and has the SA password for the SQL box). If the database isn't as idiot-proof as it can possibly be, well - bye bye database.
If your data is important, then yes, use foreign keys, create a suite of stored procedures to interact with the data, and make the toughest DB you can. If your data isn't important, why are you making a database to begin with?
Update: I always use foreign keys now. My answer to the objection "they complicated testing" is "write your unit tests so they don't need the database at all. Any tests that use the database should use it properly, and that includes foreign keys. If the setup is painful, find a less painful way to do the setup."
Foreign keys complicate automated testing
Suppose you're using foreign keys. You're writing an automated test that says "when I update a financial account, it should save a record of the transaction." In this test, you're only concerned with two tables: accounts and transactions.
However, accounts has a foreign key to contracts, and contracts has a fk to clients, and clients has a fk to cities, and cities has a fk to states.
Now the database will not allow you to run your test without setting up data in four tables that aren't related to your test.
There are at least two possible perspectives on this:
"That's a good thing: your test should be realistic, and those data constraints will exist in production."
"That's a bad thing: you should be able to unit test pieces of the system without involving other pieces. You can add integration tests for the system as a whole."
It may also be possible to temporarily turn off foreign key checks while running tests. MySQL, at least, supports this.
"They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint."
It's important to remember that the SQL standard defines actions that are taken when a foreign key is deleted or updated.
The ones I know of are:
ON DELETE RESTRICT - Prevents any rows in the other table that have keys in this column from being deleted. This is what Ken Ray described above.
ON DELETE CASCADE - If a row in the other table is deleted, delete any rows in this table that reference it.
ON DELETE SET DEFAULT - If a row in the other table is deleted, set any foreign keys referencing it to the column's default.
ON DELETE SET NULL - If a row in the other table is deleted, set any foreign keys referencing it in this table to null.
ON DELETE NO ACTION - This foreign key only marks that it is a foreign key; namely for use in OR mappers.
These same actions also apply to ON UPDATE.
The default seems to depend on which sql server you're using.
#imphasing - this is exactly the kind of mindset that causes maintenance nightmares.
Why oh why would you ignore declarative referential integrity, where the data can be guaranteed to be at least consistent, in favour of so called "software enforcement" which is a weak preventative measure at best.
There's one good reason not to use them: If you don't understand their role or how to use them.
In the wrong situations, foreign key constraints can lead to waterfall replication of accidents. If somebody removes the wrong record, undoing it can become a mammoth task.
Also, conversely, when you need to remove something, if poorly designed, constraints can cause all sorts of locks that prevent you.
There are no good reasons not to use them... unless orphaned rows aren't a big deal to you I guess.
"Before adding a record, check that a corresponding record exists in another table" is business logic.
Here are some reasons you don't want this in the database:
If the business rules change, you have to change the database. The database will need to recreate the index in a lot of cases and this is slow on large tables. (Changing rules include: allow guests to post messages or allow users to delete their account despite having posted comments, etc).
Changing the database is not as easy as deploying a software fix by pushing the changes to the production repository. We want to avoid changing the database structure as much as possible. The more business logic there is in the database the more you increase the chances of needing to change the databae (and triggering re-indexing).
TDD. In unit tests you can substitute the database for mocks and test the functionality. If you have any business logic in your database, you are not doing complete tests and would need to either test with the database or replicate the business logic in code for testing purposes, duplicating the logic and increasing the likelyhood of the logic not working in the same way.
Reusing your logic with different data sources. If there is no logic in the database, my application can create objects from records from the database, create them from a web service, a json file or any other source. I just need to swap out the data mapper implementation and can use all my business logic with any source. If there is logic in the database, this isn't possible and you have to implement the logic at the data mapper layer or in the business logic. Either way, you need those checks in your code. If there's no logic in the database I can deploy the application in different locations using different database or flat-file implementations.
From my experience its always better to avoid using FKs in Database Critical Applications. I would not disagree with guys here who say FKs is a good practice but its not practical where the database is huge and has huge CRUD operations/sec. I can share without naming ... one of the biggest investment bank of doesn't have a single FK in databases. These constrains are handled by programmers while creating applications involving DB. The basic reason is when ever a new CRUD is done it has to effect multiple tables and verify for each inserts/updates, though this won't be a big issue for queries affecting single rows but it does create a huge latency when you deal with batch processing which any big bank has to do as daily tasks.
Its better to avoid FKs but its risk has to be handled by programmers.
Bigger question is: would you drive with a blindfold on? That’s how it is if you develop a system without referential constraints. Keep in mind, that business requirements change, application design changes, respective logical assumptions in the code changes, logic itself can be refactored, and so on. In general, constraints in databases are put in place under contemporary logical assumptions, seemingly correct for particular set of logical assertions and assumptions.
Through the lifecycle of an application, referential and data checks constraints police data collection via the application, especially when new requirements drive logical application changes.
To the subject of this listing - a foreign key does not by itself "improve performance", nor does it "degrade performance" significantly from a standpoint of real-time transaction processing system. However, there is an aggregated cost for constraint checking in HIGH volume "batch" system. So, here is the difference, real-time vs. batch transaction process; batch processing - where aggreated cost, incured by constraint checks, of a sequentially processed batch poses a performance hit.
In a well designed system, data consistency checks would be done "before" processing a batch through (nevertheless, there is a cost associated here also); therefore, foreign key constraint checks are not required during load time. In fact all constraints, including foreign key, should be temporarily disabled till the batch is processed.
QUERY PERFORMANCE - if tables are joined on foreign keys, be cognizant of the fact that foreign key columns are NOT INDEXED (though the respective primary key is indexed by definition). By indexing a foreign key, for that matter, by indexing any key, and joining tables on indexed helps with better performances, not by joining on non-indexed key with foreign key constraint on it.
Changing subjects, if a database is just supporting website display/rendering content/etc and recording clicks, then a database with full constraints on all tables is over kill for such purposes. Think about it. Most websites don’t even use a database for such. For similar requirements, where data is just being recorded and not referenced per say, use an in-memory database, which does not have constraints. This doesn’t mean that there is no data model, yes logical model, but no physical data model.
I agree with the previous answers in that they are useful to mantain data consistency. However, there was an interesting post by Jeff Atwood some weeks ago that discussed the pros and cons of normalized and consistent data.
In a few words, a denormalized database can be faster when handling huge amounts of data; and you may not care about precise consistency depending on the application, but it forces you to be much more careful when dealing with data, as the DB won't be.
The Clarify database is an example of a commercial database that has no primary or foreign keys.
http://www.geekinterview.com/question_details/18869
The funny thing is, the technical documentation goes to great lengths to explain how tables are related, what columns to use to join them etc.
In other words, they could have joined the tables with explicit declarations (DRI) but they chose not to.
Consequently, the Clarify database is full of inconsistencies and it underperforms.
But I suppose it made the developers job easier, not having to write code to deal with referential integrity such as checking for related rows before deleting, adding.
And that, I think, is the main benefit of not having foreign key constraints in a relational database. It makes it easier to develop, at least that is from a devil-may-care point of view.
If you are absolutey sure, that the one underlying database system will not change in the future, I would use foreign keys to ensure data integrity.
But here is another very good real-life reason not to use foreign keys at all:
You are developing a product, which should support different database systems.
If you are working with the Entity Framework, which is able to connect to many different database systems, you may also want to support "open-source-free-of-charge" serverless databases. Not all of these databases may support your foreign key rules (updating, deleting rows...).
This can lead to different problems:
1.) You may run into errors, when the database structure is created or updated. Maybe there will only be silent errors, because your foreign keys are just ignored by the database system.
2.) If you rely on foreign keys, you will propably make less or even no data integrity checks in your business logic. Now, if the new database system does not support these foreign key rules or just behaves in a different way, you have to rewrite your business logic.
You may ask: Who needs different database systems? Well, not everybody can afford or wants a full blown SQL-Server on his machine. This is software, which needs to be maintained. Others already have invested time and money in some other DB system. Serverless database are great for small customers on only one machine.
Nobody knows, how all of these DB systems behave, but your business logic, with integrity checks, always stays the same.
They can make deleting records more cumbersome - you can't delete the "master" record where there are records in other tables where foreign keys would violate that constraint. You can use triggers to have cascading deletes.
If you chose your primary key unwisely, then changing that value becomes even more complex. For example, if I have the PK of my "customers" table as the person's name, and make that key a FK in the "orders" table", if the customer wants to change his name, then it is a royal pain... but that is just shoddy database design.
I believe the advantages in using fireign keys outweighs any supposed disadvantages.
Verifying foreign key constraints takes some CPU time, so some folks omit foreign keys to get some extra performance.
Additional Reason to use Foreign Keys:
- Allows greater reuse of a database
Additional Reason to NOT use Foreign Keys:
- You are trying to lock-in a customer into your tool by reducing reuse.
I know only Oracle databases, no other ones, and I can tell that Foreign Keys are essential for maintaining data integrity. Prior to inserting data, a data structure needs to be made, and be made correctlty. When that is done - and thus all primary AND foreign keys are created - the work is done !
Meaning : orphaned rows ? No. Never seen that in my life. Unless a bad programmer forgot the foreign key, or if he implemented that on another level. Both are - in context of Oracle - huge mistakes, which will lead to data duplication, orphan data, and thus : data corruption. I can't imagine a database without FK enforced. It looks like chaos to me. It's a bit like the Unix permission system : imagine that everybody is root. Think of the chaos.
Foreign Keys are essential, just like Primary Keys. It's like saying : what if we removing Primary Keys ? Well, total chaos is going to happen. That's what. You may not move the primary or foreign key responsibility to the programming level, it must be at the data level.
Drawbacks ? Yes, absolutely ! Because on insert, a lot more checks are going to be happening. But, if data integrity is more important than performance, it's a no-brainer. The problem with performance on Oracle is more related to indexes, which come with PK and FK's.
The argument I have heard is that the front-end should have these business rules. Foreign keys "add unnecessary overhead" when you shouldn't be allowing any insertions that break your constraints in the first place. Do I agree with this? No, but that is what I have always heard.
EDIT: My guess is he was referring to foreign key constraints, not foreign keys as a concept.
To me, if you want to go by the ACID standards, it is critical to have foreign keys to ensure referential integrity.
I have to second most of the comments here, Foreign Keys are necessary items to ensure that you have data with integrity. The different options for ON DELETE and ON UPDATE will allow you to get around some of the "down falls" that people mention here regarding their use.
I find that in 99% of all my projects I will have FK's to enforce the integrity of the data, however, there are those rare occasions where I have clients that MUST keep their old data, regardless of how bad it is....but then I spend a lot of time writing code that goes in to only get the valid data anyway, so it becomes pointless.
How about maintainability and constancy across application life cycles? Most data has a longer lifespan than the applications that make use of it. Relationships and data integrity are much too important to leave to the hope that the next dev team gets it right in the app code. If you haven't worked on a db with dirty data that doesn't respect the natural relationships, you will. The importance of data integrity will then become very clear.
I also think that foreign keys are a necessity in most databases. The only drawback (besides the performance hit that comes with having enforced consistence) is that having a foreign key allows people to write code that assumes there is a functional foreign key. That should never be allowed.
For example, I've seen people write code that inserts into the referenced table and then attempts inserts into the referencing table without verifying the first insert was successful. If the foreign key is removed at a later time, that results in an inconsistent database.
You also don't have the option of assuming a specific behavior on update or delete. You still need to write your code to do what you want regardless of whether there is a foreign key present. If you assume deletes are cascaded when they are not, your deletes will fail. If you assume updates to the referenced columns are propogated to the referencing rows when they are not, your updates will fail. For the purposes of writing code, you might as well not have those features.
If those features are turned on, then your code will emulate them anyway and you'll lose a little performance.
So, the summary.... Foreign keys are essential if you need a consistent database. Foreign keys should never be assumed to be present or functional in code that you write.
I echo the answer by Dmitriy - very well put.
For those who are worried about the performance overhead FK's often bring, there's a way (in Oracle) you can get the query optimiser advantage of the FK constraint without the cost overhead of constraint validation during insert, delete or update. That is to create the FK constraint with the attributes RELY DISABLE NOVALIDATE. This means the query optimiser ASSUMES that the constraint has been enforced when building queries, without the database actually enforcing the constraint. You have to be very careful here to take the responsibility when you populate a table with an FK constraint like this to make absolutely sure you don't have data in your FK column(s) that violate the constraint, as if you do so you could get unreliable results from queries that involve the table this FK constraint is on.
I usually use this strategy on some tables in my data mart schema, but not in my integrated staging schema. I make sure the tables I am copying data from already have the same constraint enforced, or the ETL routine enforces the constraint.
Many of the people answering here get too hung up on the importance of referential integrity implemented via referential constraints. Working on large databases with referential integrity just does not perform well. Oracle seems particularly bad at cascading deletes. My rule of thumb is that applications should never update the database directly and should be via a stored procedure. This keeps the code base inside the database, and means that the database maintains its integrity.
Where many applications may be accessing the database, problems do arise because of referential integrity constraints but this is down to a control.
There is a wider issue too in that, application developers may have very different requirements that database developers may not necessarily be that familiar with.
I have heard this argument too - from people who forgot to put an index on their foreign keys and then complained that certain operations were slow (because constraint checking could take advantage of any index). So to sum up: There is no good reason not to use foreign keys. All modern databases support cascaded deletes, so...
One time when an FK might cause you a problem is when you have historical data that references the key (in a lookup table) even though you no longer want the key available.
Obviously the solution is to design things better up front, but I am thinking of real world situations here where you don't always have control of the full solution.
For example: perhaps you have a look up table customer_type that lists different types of customers - lets say you need to remove a certain customer type, but (due to business restraints) aren't able to update the client software, and nobody invisaged this situation when developing the software, the fact that it is a foreign key in some other table may prevent you from removing the row even though you know the historical data that references it is irrelevant.
After being burnt with this a few times you probably lean away from db enforcement of relationships.
(I'm not saying this is good - just giving a reason why you may decide to avoid FKs and db contraints in general)
I'll echo what Dmitriy said, but adding on a point.
I worked on a batch billing system that needed to insert large sets of rows on 30+ tables. We weren't allowed to do a data pump (Oracle) so we had to do bulk inserts. Those tables had foreign keys on them, but we had already ensured that they were not breaking any relationships.
Before insert, we disable the foreign key constraints so that Oracle doesn't take forever doing the inserts. After the insert is successful, we re-enable the constraints.
PS: In a large database with many foreign keys and child row data for a single record, sometimes foreign keys can be bad, and you may want to disallow cascading deletes. For us in the billing system, it would take too long and be too taxing on the database if we did cascading deletes, so we just mark the record as bad with a field on the main driver (parent) table.

Resources