Although I am targeting MySQL/PHP, for the sake of my questions, I'd like to just apply this generally to any relational database that is being used in conjunction with a modern programming language. Another assumption would be that the language is leveraging a modern framework, which, on some level would handle foreign key constraints implicitly or have a means to do so explicitly.
My questions:
What are the pros and cons of creating FK constraints in the database itself as opposed to managing them at the application level?
From a design standpoint, should they ever both be used together or would that cause conflict?
If they should not be used together, what is considered the "best practice" in regards to which approach to use?
Note: This is a design theory question. Because of the wide variety of technology that could be used to satisfy an implementation, I'm not really interested in any specifics regarding an implementation.
What are the pros and cons of creating FK constraints in the database itself as opposed to managing them at the application level?
In a concurrent environment, it is surprisingly difficult to implement referential integrity in the application code, such that it is both correct and with good performance.
Unless you very carefully use locking, you are open to race conditions, such as:
Imagine there is currently one row in the parent table and no corresponding rows in the child.
Transaction T1 inserts a row in the child table, but does not yet commit. It can do that since there is a corresponding row in the parent table.
Transaction T2 deletes the parent row. It can do that since there are no child rows from its perspective (T1 hasn't committed yet).
T1 and T2 commit.
At this point, you have a child row without parent (i.e. broken referential integrity).
To remedy that, you can lock the parent row from both transactions, but that's likely to be less performant compared to the highly optimized FK implemented in the DBMS itself.
On top of that, all your clients have to adhere to the same "locking protocol" (one misbehaving client is enough to currupt the data). And the complexity rapidly raises if you have several levels of nested FKs or diamond-shaped FKs. Even if you implement referential integrity in triggers, you are only solving the "one misbehaving client" problem, but the rest remains.
Another nice thing about database-level FKs is that they usually support referential actions such as ON DELETE CASCADE. And all that is simple and self-documenting, unlike referential integrity burried inside application code.
From a design standpoint, should they ever both be used together or would that cause conflict?
You should always use database-level FKs. You could also use application level "pre-checks" if that benefits your user experience (i.e. you don't want to wait until the actual INSERT/UPDATE/DELETE to warn the user), but you should always code as if the INSERT/UPDATE/DELETE can fail even if your application-level check has passed.
If they should not be used together, what is considered the "best practice" in regards to which approach to use?
As I stated, always use database-level FKs. Optionally, you may also use application-level FKs "on top" of them.
See also: Sql - Indirect Foreign Key
Just how familiar you are with database design and the foreign key concept in general? FK is a column(s) in one table that identifies a row in another table. (I'm pretty sure you already know this.) So FK constraint is something that exists in DB, not in application. Managing FK constraints in application requires manual coding for the functionalities that are already available in DB. So why would you want to do all that manual labor? Also the DB/application interaction and development is much more difficult because of all that extra manual coding.
Best practice IMHO is to use the tools for what they are created to do. DB takes care of the FKs referential integrity and application doesn't need to concern itself with DBs inner functionalities. However, if referential integrity is your main concern and you're for example using MySQL with MyISAM engine which doesn't support FK constraints then you have to some manual checking in application (or maybe with DB triggers which I am not familiar with). Just keep in mind that when you do all kind of checking in application you still have to access the DB and thus you use more resources than what really is needed if the DB could handle the referential integrity checks. (The easy solution of course would be start using InnoDB engine but I'll stop here before this answer gets too product oriented).
So some the pros for letting the DB handle the FK constraint would be:
You don't have to think about it.
You don't have to manually code anything extra.
Application uses less resources and contains less code and thus...
... maintaining and developing both the DB and the application is a lot easier (for example the application developers don't need to understand database oriented concepts and functionalities so deeply, let the DB experts do the FK etc. thinking...).
What are the pros and cons of creating FK constraints in the database
itself as opposed to managing them at the application level?
Some of the pros of using db-enforced FKs:
Separation of schmea from code.
Making application code smaller
No chance for programmer to mess with FK rules.
Forces other applications that integrate with the db to follow the fk rules.
Some of the cons of having db-enforced FKs.
Not easy to break if you have a special case
If data is not valid, errors could be thrown. Application should be coded to gracefully handle errors such as those (specially batch ones).
Definition of FK with Referential integrity rules must be defined and coded carefully. You don't want to cascade delete 1000000 rows online.
They cause an implicit check, even if you don't want that check to occur because you know the parent row must exist. This has probably a trivial impact on performance. Performance is an issue when loading huge data volumes in batch loads and in OLAP/Data Warehouse systems. Special load tools are used and constraints such as database enforced FKs are usually disabled during the load.
From a design standpoint, should they ever both be used together or
would that cause conflict?
You could use them together for a reason. As I mentioned before, you may have special cases in your data that you can't define FKs for. Also, there are certain cases such as many-to-many self referencing relationships between tables that could not be handled by FKs (for some db engines at least).
Related
I'm actually writing my Bachelorthesis and made an ER-Model upgrade for the database based on the stuff I need for the implementation.
My following problem is, that the database in my company is basically based on triggers and there is no actual ER-Model which I could use. Is it even possible to make an ER-Model based on a database which is pretty much only using trigger to interact with the tables inside? There are pretty much no foreign keys.
Thanks for your answers,
Cheers
the database [is] based on triggers and there is no actual ER-Model which I could use....
There are pretty much no foreign keys.
I must say it sounds like your are being badly advised. You have an academic project, the design of which does not use conventional foreign keys, and that cannot be modeled with an entity relationship diagram?
only using trigger to interact with the tables
Triggers were invented before DRI was defined in the SQL standard. IIRC, they were invented by Sybase, around 1986. If their use is restricted to enforcing referential integrity constraints -- as should be -- they will be used sparingly. Most RI enforcement since the advent of SQL-92 is readily and preferably supplied declaratively in the database schema. Triggers today are properly seen as obsolete and exotic: largely superseded by DRI, and occasionally useful as a workaround in weird situations.
Can database interaction be only through triggers? Trivially, no. A trigger cannot insert new data. Without ridiculous gyrations, a trigger cannot select data to be returned to the application. But in any case that's barely a database design issue: the observations hold, no matter the tables in question.
after a few discussions with a collegue we still not have the same meaning about this topic.
In my opinion it makes more sense to create a properly designed Database with all including relations.
Im not really experienced in this area, this is why im asking you.
Advantages in my opinion
- No "wrong" inserts because of the relation conflicts in the Database
- Database and Program is strictly seperated
- Several programms for the same Datasource requires less work to customize
- Making the use of LINQ much easier
- and many more.... ?
Possible disadvantages of this way?
What are the advantages of not related Tables?
Transactional systems should "always" have the referential integrity enforced as close to the database as possible. Most people would agree that this is best done right inside the database itself. You have correctly recognized many of the advantages of letting the DBMS enforce referential integrity.
I said "always" above because I believe in common sense and deliberate decisions not rules of thumb.
One reason why someone may not want to enforce referential integrity within the database is that you have a cyclical relationship where the parent and the child need to point to each other and it is not possible to insert one record because the other isn't there yet. This leaves you with a so-called catch-22. In this case, you may need to enforce the referential integrity in program logic. Still, the best place for this is in the data layer, not in the application layer.
Another reason why some people don't worry about referential integrity is when the data is read-only. This can happen in a reporting database or data warehouse. Referential integrity in the database creates indexes which are used to enforce the relationships. This can sometimes be a space issue, but more often it is just a problem with making the data warehouse load harder because of the order of operations required.
One more reason why referential integrity is sometimes not used is that archiving old transactional data can get tricky because of complex interrelationships between master tables and transaction tables. You can easily find yourself in a position where it's impossible to delete any data, no matter how old it is, because it is somehow related to something that is related to another thing that is needed by something current.
Having said all of this you should definitely start from the position of using referential integrity features of your database and only back away from this if you have a really good, well considered reason.
Of course !!! You must enforce the referenctial integrity within your database model ! Safer, more efficient, guaranteed data integrity, and you do not rely on the programmer. No discussion here.
Not related tables are ONLY usable if you are just building a "reporting db" that downloads nightly data from various systems, for example.
I was wondering how useful foreign keys really are in a database. Essentially, if the developers know what keys the different tables depend on, they can write the queries just as though there was a foreign key, right?
Also, I do see how to foreign-key constraints help prevent all sorts of bugs with data integrity, but say for example, the programmers do a good job of preserving data integrity, how necessary are foreign keys really?
If you don't care about referential integrity then you are right. But.... you should care about referential integrity.
The problem is that people make mistakes. Computers do not.
Regarding your comment:
but say for example, the programmers do a good job of
preserving data integrity
Someone will eventually make a mistake. No one is perfect. Also if you bring someone new in you aren't always sure of their ability to write "perfect" code.
In addition to that you lose the ability to do cascading deletes and a number of other features that having defined foreign keys allow.
I think that assuming that programmers will always preserve data integrity is a risky assumption.
There's no reason why you wouldn't create foreign keys, and being able to guarantee integrity instead of just hoping for integrity is reason enough.
Not using referential integrity in a database is like not using seatbelts in cars. It will provide you with measurable improvements in taking you from A->B, but it will make "real" difference only in the most extreme cases. Why take the "risk" unless you really have to?
The underlaying reason people ask this question is always performance.
Foreign keys give the optimizer much more information to work with, and it will potentially produce better execution plans. It's not like a specific query will be % percent faster with enabled constraints, it's more like you effectively eliminate entire classes of problems due to bad execution plans. You also enable the optimizer to rewrite queries in ways that just isn't possible without the constraints (join elimination for example).
Starting right here, I would like to start a myth that referential integrity always increases performance in databases. I'm fairly confident that if 100 people designed their databases with full integrity checking, less than 5 people will actually have to consider spend a whopping 1 second to disable them for performance reasons. Out of those 5 people, there will be close to 0 people who find that they need to disable 100% of the constraints.
Foreign keys are invaluable as a means of ensuring integrity, and even if you trust your developers to never (!) make errors the cost of having them is usually well worth it.
Foreign keys also serve as documentation, in that you can see what relates to what. This information is typically also used by tools, such as for generating reports, creating data sets from table definitions, object-relational mappers, etc. Even if you do not use any of these today, having FKs will make it easier to tread that path later.
Foreign keys also allow you to define cascade rules, which e.g. can be used to to delete associated records in related tables when a row in one table is deleted.
Only if you have ridiculously high loads should you consider bypassing FKs.
Edit: updated answer to include points from other answers (reports, cascades).
You said
but say for example, the programmers
do a good job of preserving data
integrity
The expression you were looking for is, "I'm 100% certain that every programmer and every database administrator will manually preserve data integrity perfectly no matter what application touches this database, no matter how complex the database becomes, from now until the time it's decommissioned."
You don't have to use them but why wouldn't you?
They are there to help. From making life easier with cascade updates and cascade deletes, to guaranteeing that constraints aren't violated.
Maybe the application honors the constraints, but isn't it useful to have them clearly specified? You could document them, or you could put them in the database where most programmers expect to find constraints they are supposed to conform to (a better idea I think!).
Finally, if you ever need to import data into this database which doesn't go via the front-end, you may accidently import data which violates the constraints and breaks the application.
I'd definetly not recommend skipping the relationships in a database
Foreign Keys make life so much easier when using report builders and data analysis tools. Just select one table, check the include related tables box and BAM! you've got you're report built. Ok Ok, it's not that easy, but they certianly save time in that respect.
Use constraints rather than application logic to enforce integrity because it is generally easier, cheaper and more reliable to maintain constraints in one place (the database) rather than in every application.
I understand from one of your comments that your motivation for asking the question is that you think leaving out the keys may make it easier to evolve the database design during development. In my experience you are wrong about that. I find that it's actually better to be more restrictive with constraints in the early stages of development. If in doubt, create the constraint because it's much easier to remove constraints later than it is to create them. Removing a constraint will tend to break fewer things than adding one and generally requires less testing and fewer code changes to achieve.
Another point to make is that when you scrap your current user interface and use a new one with shiny new tools, you won't lose your referential integrity because the new devs have no idea what should be related to what. Databases are generally in use much much longer than user interfaces. They are also often used by more than one application interface and then you have the problem of different interfaces trying to enforce different integrity rules.
I will also point out that I have had occasion to look at the data in, quite literally, hundreds of databases and have not found one yet that has good data if they didn't set up FKs. This bad data complicates reporting, it complicates imports and exports to and from clients and other third party vendors who need or provide the data. And if the bad data is in a financial area, it could also have legal and accounting implications. I can even remember one time the company had thousands of bad inventory records where the actual product that was stored was no longer identifiable (nor the location) which also created issues with defining the value of the inventory necessary for financial reporting. This is not only bad from a perspective of not knowing what parts you have on hand, but it enables people to steal parts without being caught simply by deleting the part number from the part table (this particular place didn't have auditing in place either.).
Folks have offered up some good answers above. However, one important point I didn't see mentioned is that foreign keys make your entity relationship diagrams (ERDs) easier to generate and much more meaningful. Without FKs, you either need to depict the FK relationships on your ERD manually (painful for you) or not at all (painful for others, and perhaps even for yourself once your memory of the implied FK relationships starts to fade over time). With FKs explicitly defined, most tools that automatically generate ERDs from database object definitions will automatically detect and depict the FK relationships.
Perhaps the question should be "How bad are orphan records?". In many cases orphaned records aren't really going to hurt anything. Yes these records may persist until the end of time but how bad is this really? Cascading updates or deletes are rarely useful features. Referential integrity sounds nice but I think is not as important as we have been lead to believe. The biggest benefit to FK's is the documentation they provide. In my experience FK's for referential integrity are way more trouble than they are worth.
I am having the same question today, and found many articles talking about why you don't have to use foreign keys online. But so far, 10 of 11 answers here say you should have FKs.
I am not a db expert and just want to share some points I found online about when and why you don't have FKs:
Some points from 9 reasons why there are no foreign keys constraints:
Performance
Legacy data
Full table reload
Higher level framework
Cross database relations
Database platform agnostic
Open for change
Lazy architect
Keep model a secret
Some points from At GitHub we do not use foreign keys, ever, anywhere.
FKs are in your way to shard your database.
FKs are a performance impact.
FKs don't work well with online schema migrations.
Note: I don't have any opinions. Just sharing some online articles to provide a different answer to most of the current ones.
It is causing so much trouble in terms of development just by letting database enforcing foreign key. Especially during unit test I can’t drop table due to foreign key constrains, I need to create table in such an order that foreign key constrain warning won’t get triggered. In reality I don’t see too much point of letting database enforcing the foreign key constrains. If the application has been properly designed there should not be any manual database manipulation other than select queries. I just want to make sure that I am not digging myself into a hole by not having foreign key constrains in database and leaving it solely to the application’s responsibility. Am I missing anything?
P.S. my real unit tests (not those that use mocking) will drop existing tables if the structure of underlying domain object has been modified.
In my experience, if you don't enforce foreign keys in a database, then eventually (assuming the database is relatively large and heavily used) you will end up with orphaned records. This can happen in many ways, but it always seems to happen.
If you index properly, there should not be any performance advantages to foreign keys.
So the question is, does the potential damage/hassle/support cost/financial cost of having orphaned records in your database outweigh the development and testing hassle?
In my experience, for business applications I always use foreign keys. It should just be a one-time setup cost to get your build scripts working correctly, and the data stability will more than pay for that over the life of an application.
The point of enforcing the rules in the database is that it's declarative - e.g. you do not have to write ton of code to handle it.
As far as your unit tests, just delete tables in the proper order. You just have to write a function to do it right once.
Your issues in development should not drive the DB design. Constantly rebuilding a DB is a developer use case, not a customer use case.
Also, the DB constraints help beyond your application. You never know what your customer might try to do. Don't over do it, but you need a few.
It might seem like you can rely on your applications to follow implied rules, but unless you enforce them eventually someone will make a mistake.
Or maybe 5 years from now someone will do a tidy-up of old records "which are no longer needed" and not realise that there is data in other tables still referencing them. Then a few days/weeks later you or your successor gets the fun job of trying to repair the mess that the database has got in to. :-)
Here's a nice discussion on that in a previous question on SO: What's wrong with foreign keys?. [Edit]: The argument is to make non-enforced foreign keys to get some of the pros if any of the cons apply.
If the application has been properly
designed there should not be any
manual database manipulation other
than select queries
What? What kind of koolaid are you drinking? Most databases applications exist to manipulate the data in the database not just to see it. Generally the whole purpose of the application is to add the new orders or create the new customer records or document the customer service calls etc.
Foreign keys are for data integrity. Data integrity is critical to being able to use the data with any reliability. Databases without data integrity are useless and can cause companies to lose money. This trumps your self-centered view that FKs aren't needed because they make development more complicated for you. The data is far more important than your convenience in running tests (which can be written to account for the FKs).
How compatible is ORM and existing databases that have a lot of constraints (particularly unique key constraints/unique indexes beyond primary keys) enforced within the database itself?
(Often these are preexisting databases, shared by numerous legacy applications. But good database modeling practice is to define as many constraints as possible in the database, as a double-check on the applications. Also note that the database engine I am working with does not support deferred constraint checking.)
The reason I am asking is that the ORMs I have looked into, NHibernate and Linq to SQL, don't seem to hold up very well in the presence of database unique constraints. For example, deleting a row and re-inserting one with the same business key results in a foreign key exception. (There are subtle, harder to avoid examples as well.) The ORMs observe primary key and foreign key constraints, but tend to be oblivious to unique constraints.
I understand that there are workarounds, such as the NHibernate flush method. However, I feel this is an extremely leaky abstraction and makes it hard to design the application with regards to a separation of concerns. Ideally, all of the objects can be manipulated in memory by subroutines and then the main routine can take responsibility for the call to actually sync the database. This isolates the update and allowes for custom logic to inspect all of the updates before they are actually submitted to the database.
Executing the commands in the correct order is non-trivial. See my question here. Nonetheless, I was expecting better support for the common cases among the popular ORMs. This seems so important for introducing an ORM into an existing environment.
What have been your experiences with using ORM technologies is light of these issues?
This is of course IMHO...
ORM in general treats databases as merely a storage medium for data and is geared towards maintaining the constraints/business logic in the "O" side and not the "R" side. I haven't seen any ORM products that make use of some of the more "hardcore" relational database concepts like alternate keys, composite unique indexes, and exclusive subtypes. In a sense, ORM makes the database a second class citizen.
Call me old fashioned, but ORM seems to be good for reading data but for writing data back to a non-trivial relational design, I've always found it falls short. I prefer to do all my updates through SQL and/or stored procedures.
Good ORMs, and NHibernate is one, will enforce referential integrity and proper order execution if the database is mapped correctly. As far as I know, none of them support check or unique constraints. Check constraints are business rules that should be enforced in the business objects. I usually only enforce critical business rules (i.e. the business would lose money and/or I would lose my job if these rules were violated) in the database using check constraints and/or triggers.
Unique constraints usually represent an alternate key. With ORMs, it's common practice to use a surrogate key (identity) as the primary key and enforce a unique constraint on the natural key. It would be challenging for an ORM to implement unique constraint checking because it would require a select and lock before every insert or update. In general, the best practice is to always perform operations in a transaction that can be rolled back if it fails and provide a meaningful error message to the user.
For example, deleting a row and re-inserting one with the same business key results in a foreign key exception.
Were you trying to do this in the scope of a single ISession? I could see that being problematic.