Do you absolutely need foreign keys in a database? - database

I was wondering how useful foreign keys really are in a database. Essentially, if the developers know what keys the different tables depend on, they can write the queries just as though there was a foreign key, right?
Also, I do see how to foreign-key constraints help prevent all sorts of bugs with data integrity, but say for example, the programmers do a good job of preserving data integrity, how necessary are foreign keys really?

If you don't care about referential integrity then you are right. But.... you should care about referential integrity.
The problem is that people make mistakes. Computers do not.
Regarding your comment:
but say for example, the programmers do a good job of
preserving data integrity
Someone will eventually make a mistake. No one is perfect. Also if you bring someone new in you aren't always sure of their ability to write "perfect" code.
In addition to that you lose the ability to do cascading deletes and a number of other features that having defined foreign keys allow.

I think that assuming that programmers will always preserve data integrity is a risky assumption.
There's no reason why you wouldn't create foreign keys, and being able to guarantee integrity instead of just hoping for integrity is reason enough.

Not using referential integrity in a database is like not using seatbelts in cars. It will provide you with measurable improvements in taking you from A->B, but it will make "real" difference only in the most extreme cases. Why take the "risk" unless you really have to?
The underlaying reason people ask this question is always performance.
Foreign keys give the optimizer much more information to work with, and it will potentially produce better execution plans. It's not like a specific query will be % percent faster with enabled constraints, it's more like you effectively eliminate entire classes of problems due to bad execution plans. You also enable the optimizer to rewrite queries in ways that just isn't possible without the constraints (join elimination for example).
Starting right here, I would like to start a myth that referential integrity always increases performance in databases. I'm fairly confident that if 100 people designed their databases with full integrity checking, less than 5 people will actually have to consider spend a whopping 1 second to disable them for performance reasons. Out of those 5 people, there will be close to 0 people who find that they need to disable 100% of the constraints.

Foreign keys are invaluable as a means of ensuring integrity, and even if you trust your developers to never (!) make errors the cost of having them is usually well worth it.
Foreign keys also serve as documentation, in that you can see what relates to what. This information is typically also used by tools, such as for generating reports, creating data sets from table definitions, object-relational mappers, etc. Even if you do not use any of these today, having FKs will make it easier to tread that path later.
Foreign keys also allow you to define cascade rules, which e.g. can be used to to delete associated records in related tables when a row in one table is deleted.
Only if you have ridiculously high loads should you consider bypassing FKs.
Edit: updated answer to include points from other answers (reports, cascades).

You said
but say for example, the programmers
do a good job of preserving data
integrity
The expression you were looking for is, "I'm 100% certain that every programmer and every database administrator will manually preserve data integrity perfectly no matter what application touches this database, no matter how complex the database becomes, from now until the time it's decommissioned."

You don't have to use them but why wouldn't you?
They are there to help. From making life easier with cascade updates and cascade deletes, to guaranteeing that constraints aren't violated.
Maybe the application honors the constraints, but isn't it useful to have them clearly specified? You could document them, or you could put them in the database where most programmers expect to find constraints they are supposed to conform to (a better idea I think!).
Finally, if you ever need to import data into this database which doesn't go via the front-end, you may accidently import data which violates the constraints and breaks the application.
I'd definetly not recommend skipping the relationships in a database

Foreign Keys make life so much easier when using report builders and data analysis tools. Just select one table, check the include related tables box and BAM! you've got you're report built. Ok Ok, it's not that easy, but they certianly save time in that respect.

Use constraints rather than application logic to enforce integrity because it is generally easier, cheaper and more reliable to maintain constraints in one place (the database) rather than in every application.
I understand from one of your comments that your motivation for asking the question is that you think leaving out the keys may make it easier to evolve the database design during development. In my experience you are wrong about that. I find that it's actually better to be more restrictive with constraints in the early stages of development. If in doubt, create the constraint because it's much easier to remove constraints later than it is to create them. Removing a constraint will tend to break fewer things than adding one and generally requires less testing and fewer code changes to achieve.

Another point to make is that when you scrap your current user interface and use a new one with shiny new tools, you won't lose your referential integrity because the new devs have no idea what should be related to what. Databases are generally in use much much longer than user interfaces. They are also often used by more than one application interface and then you have the problem of different interfaces trying to enforce different integrity rules.
I will also point out that I have had occasion to look at the data in, quite literally, hundreds of databases and have not found one yet that has good data if they didn't set up FKs. This bad data complicates reporting, it complicates imports and exports to and from clients and other third party vendors who need or provide the data. And if the bad data is in a financial area, it could also have legal and accounting implications. I can even remember one time the company had thousands of bad inventory records where the actual product that was stored was no longer identifiable (nor the location) which also created issues with defining the value of the inventory necessary for financial reporting. This is not only bad from a perspective of not knowing what parts you have on hand, but it enables people to steal parts without being caught simply by deleting the part number from the part table (this particular place didn't have auditing in place either.).

Folks have offered up some good answers above. However, one important point I didn't see mentioned is that foreign keys make your entity relationship diagrams (ERDs) easier to generate and much more meaningful. Without FKs, you either need to depict the FK relationships on your ERD manually (painful for you) or not at all (painful for others, and perhaps even for yourself once your memory of the implied FK relationships starts to fade over time). With FKs explicitly defined, most tools that automatically generate ERDs from database object definitions will automatically detect and depict the FK relationships.

Perhaps the question should be "How bad are orphan records?". In many cases orphaned records aren't really going to hurt anything. Yes these records may persist until the end of time but how bad is this really? Cascading updates or deletes are rarely useful features. Referential integrity sounds nice but I think is not as important as we have been lead to believe. The biggest benefit to FK's is the documentation they provide. In my experience FK's for referential integrity are way more trouble than they are worth.

I am having the same question today, and found many articles talking about why you don't have to use foreign keys online. But so far, 10 of 11 answers here say you should have FKs.
I am not a db expert and just want to share some points I found online about when and why you don't have FKs:
Some points from 9 reasons why there are no foreign keys constraints:
Performance
Legacy data
Full table reload
Higher level framework
Cross database relations
Database platform agnostic
Open for change
Lazy architect
Keep model a secret
Some points from At GitHub we do not use foreign keys, ever, anywhere.
FKs are in your way to shard your database.
FKs are a performance impact.
FKs don't work well with online schema migrations.
Note: I don't have any opinions. Just sharing some online articles to provide a different answer to most of the current ones.

Related

Handle Databaserelations on serverside or in the program

after a few discussions with a collegue we still not have the same meaning about this topic.
In my opinion it makes more sense to create a properly designed Database with all including relations.
Im not really experienced in this area, this is why im asking you.
Advantages in my opinion
- No "wrong" inserts because of the relation conflicts in the Database
- Database and Program is strictly seperated
- Several programms for the same Datasource requires less work to customize
- Making the use of LINQ much easier
- and many more.... ?
Possible disadvantages of this way?
What are the advantages of not related Tables?
Transactional systems should "always" have the referential integrity enforced as close to the database as possible. Most people would agree that this is best done right inside the database itself. You have correctly recognized many of the advantages of letting the DBMS enforce referential integrity.
I said "always" above because I believe in common sense and deliberate decisions not rules of thumb.
One reason why someone may not want to enforce referential integrity within the database is that you have a cyclical relationship where the parent and the child need to point to each other and it is not possible to insert one record because the other isn't there yet. This leaves you with a so-called catch-22. In this case, you may need to enforce the referential integrity in program logic. Still, the best place for this is in the data layer, not in the application layer.
Another reason why some people don't worry about referential integrity is when the data is read-only. This can happen in a reporting database or data warehouse. Referential integrity in the database creates indexes which are used to enforce the relationships. This can sometimes be a space issue, but more often it is just a problem with making the data warehouse load harder because of the order of operations required.
One more reason why referential integrity is sometimes not used is that archiving old transactional data can get tricky because of complex interrelationships between master tables and transaction tables. You can easily find yourself in a position where it's impossible to delete any data, no matter how old it is, because it is somehow related to something that is related to another thing that is needed by something current.
Having said all of this you should definitely start from the position of using referential integrity features of your database and only back away from this if you have a really good, well considered reason.
Of course !!! You must enforce the referenctial integrity within your database model ! Safer, more efficient, guaranteed data integrity, and you do not rely on the programmer. No discussion here.
Not related tables are ONLY usable if you are just building a "reporting db" that downloads nightly data from various systems, for example.

Database Designing: An art or headache (Managing relationships)

I have seen in my past experience that most of the people don't use physical relationships in tables and they try to remember them and apply them through coding only.
Here 'Physical Relationships' refer to Primary Key, Foreign Key, Check constraints, etc.
While designing a database, people try to normalize the database on paper and keep things documented. Like, if I have to create a database for a marketing company, I will try to understand its requirements.
For example, what fields are mandatory, what fields will contain only (a or b or c) etc.
When all the things are clear, then why are most of the people afraid of the constraints?
Don't they want to manage things?
Do they have a lack of knowledge
(which I don't think is so)?
Are they not confident about future
problems?
Is it really a tough job managing all these entities?
What is the reason in your opinion?
I always have the DBMS enforce both primary key and foreign key constraints; I often add check constraints too. As far as I am concerned, the data is too important to run the risk of inaccurate data being stored.
If you think of the database as a series of stored true logical propositions, you will see that if the database contains a false proposition - an error - then you can argue to any conclusion you want. Given a false premise, any conclusion is true.
Why don't other people use PK and FK constraints, etc?
Some are unaware of their importance (so lack of knowledge is definitely a factor, even a major factor). Others are scared that they will cost too much in performance, forgetting that one error that has to be fixed may easily use up all the time saved by not having the DBMS do the checking for you. I take the view that if the current DBMS can't handle them well, it might be (probably is) time to change DBMS.
Many developers will check the constraints in code above the database before they actually go to perform an operation. Sometimes, this is driven by user experience considerations (we don't want to present choices / options to users that can't be saved to the database). In other cases, it may be driven by the pain associated with executing a statement, determining why it failed, and then taking corrective action. Most people would consider code more maintainable if it did the check upfront, along with other business logic that might be at play, rather than taking corrective action through an exception handler. (Not that this is necessarily an ideal line of thinking, but it is a prevalent one.) In any case, if you are doing the check in advance of issuing the statement, and not particularly conscious of the fact that the database might get touched by applications / users who are not coming in through your integrity-enforcing code, then you might conclude that database constraints are unnecessary, especially with the performance hit that could be incurred from their use. Also, if you are checking integrity in the application code above the database, one might consider it a violation of DRY (Don't Repeat Yourself) to implement logically equivalent checks in the database itself. The two manifestations of integrity rules (those in database constraints and those in application code above the database) could in principle become out-of-sync if not managed carefully.
Also, I would not discount option 2, that many developers don't know much about database constraints, too readily.
Well, I mean, everyone is entitled to their own opinion and development strategy I suppose, but in my humble opinion these people are almost certainly wrong :)
The reason, however, someone may wish to avoid constraints is efficiency. Not because constraints are slow, but because storing redundant data (i.e. caching) is a very effective way of speeding up (well, avoiding) an expensive calculation. This is an acceptable approach, when implemented properly (i.e. the cache is updated a regular/appropriate intervals, generally I do this with a trigger).
As to the motivation to not us FKs without a caching motivation, I can't imagine it. Perhaps they aim to be 'flexible' in their DB structure. If so, fine, but then don't use a relational DB, because it's pointless. Non-relational DBs (OO dbs) certainly have their place, and may even arguably be better (quite arguable, but interesting to argue) but it's a mistake to use a relational DB and not use it's core properties.
I would always define PK and FK constraints. especially when using an ORM. it really makes the life easy for everybody to let the ORM reverse engineer the database instead of manually configuring it to use some PKs and FKs
There are several reasons for not enforcing relationships in descending order of importance:
People-friendly error handling.
Your program should check constraints and send an intelligible message to the user. For some reason normal people dont like "SQL exception code -100013 goble rule violated for table gook'.
Operational flexibility.
You dont really want your operators trying to figure out which order you must load your tables in at 3 a.m., nor do you want your testers pulling their hair out 'cause they cannot reset the database back to its starting position.
Efficiency.
Cheking constraints does consume IO and CPU.
Functionality.
Its a cheap way to save details for later recovery. For instance in an on line order system you could leave the detail item rows in the table when the users kills a parent order, if he later reinstates the order the details re-appear as if by a miracle -- you acheive this extra feature by deleteing lines of code. (course you need some housekeeping process but it is trivial!)
As things get more complex and more tables and relationships are needed in the database, how can you ensure the database developer remembers to check all of them? When you makea change to the schema that adds a new "informal" relationship, how can you ensure all the application code which might be affected gets changed?
Suddenly you could be deleting records that should stay because they have related data the developer forgot to check when writng the delete process or because that process was in place before the last ten related tables were added to the schema.
It is foolhardy in the extreme to not formally set up PK/FK relationships. I process data received from many different vendors and databases. You can tell which ones have data integrity problems most likely caused by a failure to explicitly define relationships by the poor quality of their data.

Should referential integrity be enforced?

One of the reasons why referential integrity should not be enforced is performance. Because Db has to validate all updates against relationships, it just makes things slower but what are the other pros and cons of enforcing and not enforcing?
Because relationships are maintained in the business logic layer anyway, it just makes them redundant for db to do it. What are your thoughts on it?
The database is responsible for data. That's it. Period.
If referential integrity is not done in the database, then it's not integrity. It's just trusting people not to do bad things, in which case you probably shouldn't even worry about password-protecting your data either :-)
Who's to say you won't get someone writing their own JDBC-connected client to totally screw up the data, despite your perfectly crafted and bug-free business layer (the fact that it probably won't be bug-free is another issue entirely, mandating that the DB should protect itself).
First of all, it's almost impossible to make it really work correctly. To have any chance of working right, you need to wrap a lot of the cascading modifications as transactions, so you don't have things out of sync while you've changed one part of the database, but are still updating others that depend on the first. This means code that should be simple and aware only of business logic suddenly needs to know about all sorts of concurrency issues.
Second, keeping it working is almost impossible to hope for -- every time anybody touches the business logic, they need to deal with those concurrency issues again.
Third, this makes the referential integrity difficult to understand -- in the future, when somebody wants to learn about your database structure, they'll have to reverse engineer it out of your business logic. With it in the database, it's separate, so what you have to look at only deals with referential integrity, not all sorts of unrelated issues. You have (for example) direct chains of logic showing what a modification to a particular field will trigger. At least for quite a few databases, that logic can be automatically extracted and turned into fairly useful documentation (e.g., tree diagrams showing dependencies). Extracting the same kind of information from the BLL is more likely to be a fairly serious project.
There are certainly some points in the other direction, and reasons to craft all of this by hand -- scalability and performance being the most obvious. When/if you go that route, however, you should be aware of what you're giving up to get that performance. In some cases, it's a worthwhile tradeoff -- but in other cases it's not, and you need information to make a reasoned decision.
Relationships may be maintained in a business logic layer. Unless you can guarantee 100% beyond any doubt that your BLL is and always will be bug-free, then you don't have data integrity. And you can't make that guarantee.
Also, if another app will ever touch your database, it isn't required to follow (read: reimplement, maybe in a subtlely wrong way) the rules in your BLL. It could corrupt the data, even if you somehow managed to be one of the 3 programmers on Earth to write bug-free code.
The database, meanwhile, enforces the same rules for everybody -- and rules enforced by the database are far less likely to be overlooked when you're updating, since the DB won't allow it.
Have a listen to Dan Pritchett, Technical Fellow at eBay on why certain database constructs such as transactions and referential integrity are not the mandates that textbooks might indicate they should be... It comes down to the types of data, the volume of queries and business requirements. Balance those and it will lead you to pragmatic solutions, not dogmatic answers...
However, do not assume that keeping relationships in the BLL will protect your data. You cannot guarantee that future developers won't expose new APIs that bypass the BLL for "performance" reasons, or simple lack of understanding of your architecture...
The performance assumption on which the question is based is incorrect as a general rule. Usually if you require RI to be enforced then the database is the most efficient place to do it, NOT the application - otherwise the application has to requery more data in order to be able to validate RI outside the database.
Also, RI constraints in the database are useful for the query optimiser for making other queries more efficient. Integrity constraints in the application can't achieve that.
Lastly, the cost of maintaining integrity constraints in every application is generally more expensive and complex than doing it once in one place.
But Colonel Ingus, if you've got the customer with an id in the session you've already probed the database! The problem is when you then write your sales order away, but didn't attach it to a product because you didn't prob for a product. One way or another you'll end up with orphaned records, just like the very large company I'm currently working for has. We have customers with no history and history with no customers; customers with outstanding balances who've never bought anything and goods sold to customers who don't exist - interesting business concepts - and it keeps a team of very frustrated support staff in full time employment trying to sort it out. It would be far less expensive to have put RI on everything and bought a bigger box to sort out any perceived performance problems.
A lot has already been said about the fact that the DB should be the final place to validate/control your constraints (and I couldn't agree more)
If the data is important, then your application won't be the last to access the database and it won't be the only one.
But there is another very important fact about referential integrity (and other constraints): it documents your datamodel and makes the dependencies between the tables explicit.
As far as performance is concerned, defining FKs (or other constraints) in the database can make things even faster in certain cases, because the DBMS can rely on the constraints and make approriate optimizations.
It depends on the data, if its highly transactional data such as business transactions and what not where frequent updates are happening then enforcing the business rules in the database is extremely important.. But for everything else the performance impact may not be worth it..
What paxdiablo and dportas said. And my two cents. There are two other considerations.
In order to validate referential integrity for a new insert, you have to do a probe into the database to verify that the reference is valid. You just nullfied the performance gain that led you to want to enforce integrity in the application. It's actually faster to let the DBMS enforce referential integrity.
Beyond that, consider the case where you have more than one application all reading and writing data in a single database. If you enforce referential integrity in the business application layer, you have to make sure that all of the applications do things right. Otherwise, some aberrant application could store invalid refrences, and the problem could surface when a different application went to use the data. That's a real mess.
Better to have the DBMS enforce the data rules for all the applications.
If you are maintaining the relationships in the business layer, you can guarantee that a few years down the pike you will have bad data in the database. The business layer is the worst possible place to do that.
Further, when you replace the business layer with something else you have to redefine all these things. Datbases often outlast the original application they are written for by many years, put the correct realtionships and constraints in the datbase where they belong.
What happens when you try to insert a record into the database and it fails referential integrity? You get an error from the database. Then you have to change your code so that it doesn't try to insert invalid data. To avoid ref integrity errors your code MUST know which data is which. Therefore, referential integrity is useless.
Walter Mitty said "In order to validate referential integrity for a new insert, you have to do a probe into the database to verify that the reference is valid." Sigh... this is complete nonsense. If I have a Customer object in the session (that's memory, aka RAM for some of you fellas), I know the Customer's ID and can use it to insert a SalesOrder object. There is no need to look up the Customer.
I am on a system now with tight Referential Integrity and Hibernate wrapped around it with its gross tenticles. It's the slowest system I have ever seen. I did not design it and if I had, it would be many times faster AND easier to maintain. Hibernate sucks.

Pros and cons of programmatically enforcing foreign key than in database

It is causing so much trouble in terms of development just by letting database enforcing foreign key. Especially during unit test I can’t drop table due to foreign key constrains, I need to create table in such an order that foreign key constrain warning won’t get triggered. In reality I don’t see too much point of letting database enforcing the foreign key constrains. If the application has been properly designed there should not be any manual database manipulation other than select queries. I just want to make sure that I am not digging myself into a hole by not having foreign key constrains in database and leaving it solely to the application’s responsibility. Am I missing anything?
P.S. my real unit tests (not those that use mocking) will drop existing tables if the structure of underlying domain object has been modified.
In my experience, if you don't enforce foreign keys in a database, then eventually (assuming the database is relatively large and heavily used) you will end up with orphaned records. This can happen in many ways, but it always seems to happen.
If you index properly, there should not be any performance advantages to foreign keys.
So the question is, does the potential damage/hassle/support cost/financial cost of having orphaned records in your database outweigh the development and testing hassle?
In my experience, for business applications I always use foreign keys. It should just be a one-time setup cost to get your build scripts working correctly, and the data stability will more than pay for that over the life of an application.
The point of enforcing the rules in the database is that it's declarative - e.g. you do not have to write ton of code to handle it.
As far as your unit tests, just delete tables in the proper order. You just have to write a function to do it right once.
Your issues in development should not drive the DB design. Constantly rebuilding a DB is a developer use case, not a customer use case.
Also, the DB constraints help beyond your application. You never know what your customer might try to do. Don't over do it, but you need a few.
It might seem like you can rely on your applications to follow implied rules, but unless you enforce them eventually someone will make a mistake.
Or maybe 5 years from now someone will do a tidy-up of old records "which are no longer needed" and not realise that there is data in other tables still referencing them. Then a few days/weeks later you or your successor gets the fun job of trying to repair the mess that the database has got in to. :-)
Here's a nice discussion on that in a previous question on SO: What's wrong with foreign keys?. [Edit]: The argument is to make non-enforced foreign keys to get some of the pros if any of the cons apply.
If the application has been properly
designed there should not be any
manual database manipulation other
than select queries
What? What kind of koolaid are you drinking? Most databases applications exist to manipulate the data in the database not just to see it. Generally the whole purpose of the application is to add the new orders or create the new customer records or document the customer service calls etc.
Foreign keys are for data integrity. Data integrity is critical to being able to use the data with any reliability. Databases without data integrity are useless and can cause companies to lose money. This trumps your self-centered view that FKs aren't needed because they make development more complicated for you. The data is far more important than your convenience in running tests (which can be written to account for the FKs).

Are foreign keys really necessary in a database design?

As far as I know, foreign keys (FK) are used to aid the programmer to manipulate data in the correct way. Suppose a programmer is actually doing this in the right manner already, then do we really need the concept of foreign keys?
Are there any other uses for foreign keys? Am I missing something here?
Foreign keys help enforce referential integrity at the data level. They also improve performance because they're normally indexed by default.
Foreign keys can also help the programmer write less code using things like ON DELETE CASCADE. This means that if you have one table containing users and another containing orders or something, then deleting a user could automatically delete all orders that point to that user.
I can't imagine designing a database without foreign keys. Without them, eventually you are bound to make a mistake and corrupt the integrity of your data.
They are not required, strictly speaking, but the benefits are huge.
I'm fairly certain that FogBugz does not have foreign key constraints in the database. I would be interested to hear how the Fog Creek Software team structures their code to guarantee that they will never introduce an inconsistency.
A database schema without FK constraints is like driving without a seat belt.
One day, you'll regret it. Not spending that little extra time on the design fundamentals and data integrity is a sure fire way of assuring headaches later.
Would you accept code in your application that was that sloppy? That directly accessed the member objects and modified the data structures directly.
Why do you think this has been made hard and even unacceptable within modern languages?
Yes.
They keep you honest
They keep new developers honest
You can do ON DELETE CASCADE
They help you to generate nice diagrams that self explain the links between tables
Suppose a programmer is actually doing this in the right manner already
Making such a supposition seems to me to be an extremely bad idea; in general software is phenomenally buggy.
And that's the point, really. Developers can't get things right, so ensuring the database can't be filled with bad data is a Good Thing.
Although in an ideal world, natural joins would use relationships (i.e. FK constraints) rather than matching column names. This would make FKs even more useful.
Personally, I am in favor of foreign keys because it formalizes the relationship between the tables. I realize that your question presupposes that the programmer is not introducing data that would violate referential integrity, but I have seen way too many instances where data referential integrity is violated, despite best intentions!
Pre-foreign key constraints (aka declarative referential integrity or DRI) lots of time was spent implementing these relationships using triggers. The fact that we can formalize the relationship by a declarative constraint is very powerful.
#John - Other databases may automatically create indexes for foreign keys, but SQL Server does not. In SQL Server, foreign key relationships are only constraints. You must defined your index on foreign keys separately (which can be of benefit.)
Edit: I'd like to add that, IMO, the use of foreign keys in support of ON DELETE or ON UPDATE CASCADE is not necessarily a good thing. In practice, I have found that cascade on delete should be carefully considered based on the relationship of the data -- e.g. do you have a natural parent-child where this may be OK or is the related table a set of lookup values. Using cascaded updates implies you are allowing the primary key of one table to be modified. In that case, I have a general philosophical disagreement in that the primary key of a table should not change. Keys should be inherently constant.
Without a foreign key how do you tell that two records in different tables are related?
I think what you are referring to is referential integrity, where the child record is not allowed to be created without an existing parent record etc. These are often known as foreign key constraints - but are not to be confused with the existence of foreign keys in the first place.
I suppose you are talking about foreign key constraints enforced by the database. You probably already are using foreign keys, you just haven't told the database about it.
Suppose a programmer is actually doing
this in the right manner already, then
do we really need the concept of
foreign keys?
Theoretically, no. However, there have never been a piece of software without bugs.
Bugs in application code are typically not that dangerous - you identify the bug and fix it, and after that the application runs smoothly again. But if a bug allows currupt data to enter the database, then you are stuck with it! It's very hard to recover from corrupt data in the database.
Consider if a subtle bug in FogBugz allowed a corrupt foreign key to be written in the database. It might be easy to fix the bug and quickly push the fix to customers in a bugfix release. However, how should the corrupt data in dozens of databases be fixed? Correct code might now suddenly break because the assumptions about the integrity of foreign keys dont hold anymore.
In web applications you typically only have one program speaking to the database, so there is only one place where bugs can corrupt the data. In an enterprise application there might be several independent applications speaking to the same database (not to mention people working directly with the database shell). There is no way to be sure that all applications follow the same assumptions without bugs, always and forever.
If constraints are encoded in the database, then the worst that can happen with bugs is that the user is shown an ugly error message about some SQL constraint not satisfied. This is much prefereable to letting currupt data into your enterprise database, where it in turn will break all your applications or just lead to all kinds of wrong or misleading output.
Oh, and foreign key constraints also improves performance because they are indexed by default. I can't think of any reason not to use foreign key constraints.
Is there a benefit to not having foreign keys? Unless you are using a crappy database, FKs aren't that hard to set up. So why would you have a policy of avoiding them? It's one thing to have a naming convention that says a column references another, it's another to know the database is actually verifying that relationship for you.
FKs are very important and should always exist in your schema, unless you are eBay.
I think some single thing at some point must be responsible for ensuring valid relationships.
For example, Ruby on Rails does not use foreign keys, but it validates all the relationships itself. If you only ever access your database from that Ruby on Rails application, this is fine.
However, if you have other clients which are writing to the database, then without foreign keys they need to implement their own validation. You then have two copies of the validation code which are most likely different, which any programmer should be able to tell is a cardinal sin.
At that point, foreign keys really are neccessary, as they allow you to move the responsibility to a single point again.
Foreign keys allow someone who has not seen your database before to determine the relationship between tables.
Everything may be fine now, but think what will happen when your programmer leaves and someone else has to take over.
Foreign keys will allow them to understand the database structure without trawling through thousand of lines of code.
As far as I know, foreign keys are used to aid the programmer to manipulate data in the correct way.
FKs allow the DBA to protect data integrity from the fumbling of users when the programmer fails to do so, and sometimes to protect against the fumbling of programmers.
Suppose a programmer is actually doing this in the right manner already, then do we really need the concept of foreign keys?
Programmers are mortal and fallible. FKs are declarative which makes them harder to screw up.
Are there any other uses for foreign keys? Am I missing something here?
Although this is not why they were created, FKs provide strong reliable hinting to diagramming tools and to query builders. This is passed on to end users, who desperately need strong reliable hints.
They are not strictly necessary, in the way that seatbelts are not strictly necessary. But they can really save you from doing something stupid that messes up your database.
It's so much nicer to debug a FK constraint error than have to reconstruct a delete that broke your application.
They are important, because your application is not the only way data can be manipulated in the database. Your application may handle referential integrity as honestly as it wants, but all it takes is one bozo with the right privileges to come along and issue an insert, delete or update command at the database level, and all your application referential integrity enforcement is bypassed. Putting FK constraints in at the database level means that, barring this bozo choosing to disable the FK constraint before issuing their command, the FK constraint will cause a bad insert/update/delete statement to fail with a referential integrity violation.
I think about it in terms of cost/benefit... In MySQL, adding a constraint is a single additional line of DDL. It's just a handful of key words and a couple of seconds of thought. That's the only "cost" in my opinion...
Tools love foreign keys. Foreign keys prevent bad data (that is, orphaned rows) that may not affect business logic or functionality and therefor go unnoticed, and build up. It also prevents developers who are unfamiliar with the schema from implementing entire chunks of work without realizing they're missing a relationship. Perhaps everything is great within the scope of your current application, but if you missed something and someday something unexpected is added (think fancy reporting), you might be in a spot where you have to manually clean up bad data that's been accumulating since the inception of the schema without a database enforced check.
The little time it takes to codify what's already in your head when you're putting things together could save you or someone else a bunch of grief months or years down the road.
The question:
Are there any other uses for foreign
keys? Am I missing something here?
It is a bit loaded. Insert comments, indentation or variable naming in place of "foreign keys"... If you already understand the thing in question perfectly, it's "no use" to you.
Entropy reduction. Reduce the potential for chaotic scenarios to occur in the database.
We have a hard time as it is considering all the possiblilites so, in my opinion, entropy reduction is key to the maintenance of any system.
When we make an assumption for example: each order has a customer that assumption should be enforced by something. In databases that "something" is foreign keys.
I think this is worth the tradeoff in development speed. Sure, you can code quicker with them off and this is probably why some people don't use them. Personally I have killed a number of hours with NHibernate and some foreign key constraint that gets angry when I perform some operation. HOWEVER, I know what the problem is so it's less of a problem. I'm using normal tools and there are resources to help me work around this, possibly even people to help!
The alternative is allow a bug to creep into the system (and given enough time, it will) where a foreign key isn't set and your data becomes inconsistent. Then, you get an unusual bug report, investigate and "OH". The database is screwed. Now how long is that going to take to fix?
You can view foreign keys as a constraint that,
Help maintain data integrity
Show how data is related to each other (which can help in enforcing business logic and rules)
If used correctly, can help increase the efficiency with which the data is fetched from the tables.
We don't currently use foreign keys. And for the most part we don't regret it.
That said - we're likely to start using them a lot more in the near future for several reasons, both of them for similar reasons:
Diagramming. It's so much easier to produce a diagram of a database if there are foreign key relationships correctly used.
Tool support. It's a lot easier to build data models using Visual Studio 2008 that can be used for LINQ to SQL if there are proper foreign key relationships.
So I guess my point is that we've found that if we're doing a lot of manual SQL work (construct query, run query, blahblahblah) foreign keys aren't necessarily essential. Once you start getting into using tools, though, they become a lot more useful.
The best thing about foreign key constraints (and constraints in general, really) are that you can rely on them when writing your queries. A lot of queries can become a lot more complicated if you can't rely on the data model holding "true".
In code, we'll generally just get an exception thrown somewhere - but in SQL, we'll generally just get the "wrong" answers.
In theory, SQL Server could use constraints as part of a query plan - but except for check constraints for partitioning, I can't say that I've ever actually witnessed that.
Foreign keys had never been explicit (FOREIGN KEY REFERENCES table(column)) declared in projects (business applications and social networking websites) which I worked on.
But there always was a kind of convention of naming columns which were foreign keys.
It's like with database normalization -- you have to know what are you doing and what are consequence of that (mainly performance).
I am aware of advantages of foreign keys (data integrity, index for foreign key column, tools aware of database schema), but also I am afraid of using foreign keys as general rule.
Also various database engines could serve foreign keys in a different way, which could lead to subtle bugs during migration.
Removing all orders and invoices of deleted client with ON DELETE CASCADE is the perfect example of nice looking, but wrong designed, database schema.
Yes. The ON DELETE [RESTRICT|CASCADE] keeps developers from stranding data, keeping the data clean. I recently joined a team of Rails developers who did not focus on database constraints such as foreign keys.
Luckily, I found these: http://www.redhillonrails.org/foreign_key_associations.html -- RedHill on Ruby on Rails plug-ins generate foreign keys using the convention over configuration style. A migration with product_id will create a foreign key to the id in the products table.
Check out the other great plug-ins at RedHill, including migrations wrapped in transactions.
If you plan on generating your data access code, ie, Entity Framework or any other ORM you entirely lose the ability to generate a hierarchical model without Foreign Keys

Resources