Relationship between tables to prevent insertion of records where foreign key doesn't exist - sql-server

Hi I've set up two very basic tables. One table will act as a look up, with an identity field as a primary key. The other table uses the look up ID as a foreign key.
I have created a relationship constraint so now I cannot delete from the look up if the foreign key is used in the "main" table.
However my issue is i can add a record with a foreign key that doesn't exist.
To my way of thinking this shouldn't be allowed, can anyone tell me what setting I need to use to enforce this and whether this is typical database design or not?
Thanks Dave

You way of thinking is correct. Good database design provides some way of enforcing what is called "Referential Integrity". This is simply a buzzword for the concept you have derived on your own. Namely that a foreign key should be rejected if it refers to a non existent row. For a general discussion of referential integrity, see the following Wikipedia article. It's short.
http://en.wikipedia.org/wiki/Referential_integrity
Some pprogrammers would like to enforce referential integrity inside their programs. In general, it's a much better plan to define a referential integrity constraint inside the database, and let the DBMS do the enforcement. It's easier, it's faster, and it's more effective.
The SQL Data Definition Language (DDL) provides a way to declare a foreign key constraint when you create a table. The syntax differs a little between different dialects of SQL, but it's basically the same idea in all of them. Here's a capsule summary.
http://www.w3schools.com/sql/sql_foreignkey.asp
The documentation for SQL Server should have a description of the referential integrity constraint under the CREATE TABLE command.

Related

Creating a SQL database without defining primary key

So in my work environment we don't use a 'primary key' as defined by SQL Server. In other words, we don't right click a column and select "set as primary key".
We do however still have primary keys, we just use a unique ID column. In stored procedures we use these to access the data like you would in any relational database.
My question is, other than the built in functionality that comes with defining a primary key in SQL Server like Entity Framework stuff etc. Is there a good reason to use the 'primary key' functionality over just using a unique ID column and accessing your tables with that in your own stored procedures?
The biggest drawback I see (again other than being able to use Entity Framework and things like that) is that you have to mentally keep track or otherwise keep track of what ID relates to what tables.
There is nothing "special" about the PRIMARY KEY constraint. It's just a uniqueness constraint and you can achieve the same results by using the UNIQUE NOT NULL syntax to define your keys instead.
However, uniqueness constraints (i.e. keys in general, not "primary" keys specifically) are very important for data integrity reasons. They ensure that your data is unique which means that sensible, meaningful results can be derived from your data. It's extremely difficult to get accurate results from a database that contains duplicate data. Also, uniqueness constraints are required to enforce referential integrity between tables, which is another very important aspect of data integrity. Poor data integrity is a data management problem that costs businesses billions of dollars every year and that's the bottom line of why keys are important.
There is a further reason where unique indexes are important: query optimization and performance. Unique indexes improve query performance. If your data is supposed to be unqiue then creating a unique index on it will give the query optimizer the best chance of picking a good execution plan for your queries.
I think the drawback is not using the primary key at all and using a unique key constraint for something it wasn't intended to do.
Unique keys: You can have many of them. They are meant to offer a way to determine uniqueness among rows.
Primary key: like the Highlander, there can only be one. It's intended use is to identify the rows of the table.
I can't think of any good reason not to use a primary key. My opinion is that without a primary key, your table isn't actually a table. It's just a lump of data.
Follow Up: If you don't believe me, check out this guy who asked a bunch of DBA's if it was OK not to use a primary key.
Is it OK not to use a Primary Key When I don't Need one
There are philosophical and practical answers to your question.
The practical answer is that using the primary key constraint enforces "not null", and "unique". This protects you from application-level bugs.
The philosophical answer is that you want developers to operate at the highest possible level of abstraction, so that they don't have to stuff their brain full of detail when trying to solve problems.
Primary and foreign keys are abstractions that allow us to make assumptions about the underlying data model. We can think in terms of (business) entities, and their relationships.
In your workplace, you're forcing developers to think in terms of tables and indexes and conventions. You no longer think about "customers" and "orders" and "line items", but about software artefacts that represent those business entities, and the "we always represent uniqueness by a combination of a GUID and unique index" rule. That mental model is already complicated enough in most applications; you're just making it harder for yourselves, especially when bringing new developers into the team.

Do we need to explicitly mention which column is foreign key column?

When we create relational database tables, we have to use foreign key columns. It is obvious, otherwise we can not create relationships.
However, I noticed that it is enough to have a foreign key column, you do not need to say that there is a foreign key relationship in table A with table B.
As long as you can write the queries you can retrieve the data.
Do we use this concept for make thing easy? I know, when I look at a database table schema which has marked what columns are foreign key columns, it is easy to understand and start to work with it.
Is there any other reasons?
The point is Referential integrity. If you don't enforce it, sooner or later a bug in the code or some other accident happens and your database is left in an inconsistent state. These inconsistencies are very hard or impossible to fix afterwards.
When we create relational database tables, we have to use foreign key
columns. It is obvious, otherwise we can not create relationships.
Incorrect. You do not need to create foreign keys (though it's a good idea), and they do not represent relationships. They enforce the integrity of the relationship. A foreign key makes sure that a value in one column exists in another column.
However, I noticed that it is enough to have a foreign key column, you
do not need to say that there is a foreign key relationship in table A
with table B. As long as you can write the queries you can retrieve the data.
Yes, the relationship is based on the data itself, not by the inclusion of a foreign key. Also, foreign keys do not need to be between two tables, a table can have a foreign key to itself.
Do we use this concept for make thing easy?
No, we use foreign keys to enforce integrity. That they happen to make ERD diagrams easier to understand is simply a bonus.

Do databases use foreign keys transparently?

Do database engines utilize foreign keys transparently or a query should explicitly use them?
Based on my experience there is no explicit notion of foreign keys on a table, except that a constraint that maintains uniqueness of the key and the fact that the key (single or a group of fields) is a key which makes search efficient.
To clarify this, here is an example why it is important: I have a middleware (in particular ArcGIS for my case), for which I can control the back-end database (so I can create keys, indices, etc.) and I usually use the front (a RESTful API here). The middleware itself is a black box and to provide effective tools to take advantage of the underlying DBMS's capabilities. So what I want to understand is that if I build foreign key constraints and use queries that if implemented normally would translate into queries that would use those foreign keys, should I see performance improvements?
Is that generally the case or various engines do it differently? (I am using PostgresSQL).
Foreign keys aren't there to improve performance. They're there to enforce data integrity. They will decrease performance for inserts/updates/deletes, but they make no difference to queries.
Some DBMSs will automatically add an index to the foreign key field, which may be where the confusion is coming from. Postgres does not do this; you'll need to create the index yourself. (And yes, the database will use this index transparently.)
As far as I know Database engines needs specific queries to use foreign keys. You have to write some sort of join queries to get data from related tables.
However some Data access framework hides the complexity of accessing data from foreign keys by providing transparent way of accessing data from related tables but I am not sure that may provide much improvement in performance.
This is completely depends on the database engine.
In PostgreSQL constraints won't cause performance improvements directly, only indexes will do that.
CREATE INDEX is a PostgreSQL language extension. There are no provisions for indexes in the SQL standard.
However, adding some constraints will automatically create an index for that column(s) -- f.ex. UNIQUE & PRIMARY KEY constraints creates a btree index on the affected column(s).
The FOREIGN KEY constraint won't create indexes on the referencing column(s), but:
A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Obtain referential integrity at the expense of 2NF- is it a reasonable trade off?

Consider the following two tables:
Table A: [K1, K2, PropA]
Table B: [K3, PropB]
The primary key of Table A is composite [K1,K2].
The primary key of table B is K3.
Table has an inclusive dependency on on table B- values in K3 have to be matched by value in K2. Unfortunately, since K2 is not a unique primary key, I can't define an foreign key constraint on these columns.
as I see it the solution is to either enforce it in the application layer or to propagate the K1 column to table B, so that it will contain the entire foreign key of table A.
My question: is this considered a good or bad practice in DB design? Assume that adding the extra column is not a problem from a maintenance perspective or integrity perspective (insert is transactional).
I am using MSSQL and Oracle.
Create table C, with just one attribute, K2, which is the primary key. Now you can reference K2 with a foreign key constraint.
It would probably be a bad idea to put the K1 attribute into table B. If I've understood you correctly doing that looks like it would create a non-key dependency in violation of 2NF.
Your question does raise an important point: the often poor support for data integrity constraints offered in modern database software means database designers sometimes have to make some awkward compromises.
"... the solution is to either enforce it in the application layer or to propagate ..."
You see right or you see wrong, depending on perspective.
The true solution is for the DBMS vendors to support the general case of inclusion dependencies, of which the foreign key is only a special case.
But as long as the DBMS vendors don't do that, you are right, as long as you keep restraining yourself to those vendors' offerings, you'll have no other choice than to move constraint enforcement to the application [*], or screw up the design with all those ugly hacks you need in order to get your constraint enforced using just FKs.
A system that offers you a solution that involves neither of "enforcement in the application" and "screw up the design" is at shark.armchair.mb.ca/~erwin. Disclosure : that project is of my own making.
[*] or put all the constraint enforcement code in triggers, such that at least a new application cannot "forget" to enforce some given existing business rule.

Should you make a self-referencing table column a foreign key?

For example to create a hierarchy of categories you use a column 'parent_id', which points to another category in the same table.
Should this be a foreign key? What would the dis/advantages be?
Yes. Ensures that you don't have an orphan (entry with no parent), and depending on usage, if you define a cascading delete, when a parent is deleted, all its children will also be deleted.
Disadvantage would be a slight performance hit just like any other foreign key.
Yes, you should. If you have an attribute in a relation of database that serves as the primary key of another relation in the same database you should make it a FK.
You will enjoy the advantages associated to foreign keys:
Assuming the proper design of the relationships, foreign key constraints make it more difficult for a programmer to introduce an inconsistency into the database.
Centralizing the checking of these constraints by the database server makes it unnecessary to perform these checks on the application side. This eliminates the possibility that different applications may not check constraints in the same way.
Using cascading updates and deletes can simplify the application code.
Properly designed foreign key rules aid in documenting relationships between tables.
The disadvantages:
If you define Foreign Keys, sometimes it is harder to perform bulk operations.
Maybe it implies more disk usage and a slight performance hit.
Yes you should.
Advantages (as for any foreign key):
Ensures that parent_id references a real row in the table
Prevents accidental deletion of a parent that has children, or ensures that the delete cascades to delete the children also
Provides information the optimizer can use
I can't think of any real disadvantages.
Yes, you should make it a foreign key.
The benefits will be a better data model with less redundancy.

Resources