Do we need to explicitly mention which column is foreign key column? - database

When we create relational database tables, we have to use foreign key columns. It is obvious, otherwise we can not create relationships.
However, I noticed that it is enough to have a foreign key column, you do not need to say that there is a foreign key relationship in table A with table B.
As long as you can write the queries you can retrieve the data.
Do we use this concept for make thing easy? I know, when I look at a database table schema which has marked what columns are foreign key columns, it is easy to understand and start to work with it.
Is there any other reasons?

The point is Referential integrity. If you don't enforce it, sooner or later a bug in the code or some other accident happens and your database is left in an inconsistent state. These inconsistencies are very hard or impossible to fix afterwards.

When we create relational database tables, we have to use foreign key
columns. It is obvious, otherwise we can not create relationships.
Incorrect. You do not need to create foreign keys (though it's a good idea), and they do not represent relationships. They enforce the integrity of the relationship. A foreign key makes sure that a value in one column exists in another column.
However, I noticed that it is enough to have a foreign key column, you
do not need to say that there is a foreign key relationship in table A
with table B. As long as you can write the queries you can retrieve the data.
Yes, the relationship is based on the data itself, not by the inclusion of a foreign key. Also, foreign keys do not need to be between two tables, a table can have a foreign key to itself.
Do we use this concept for make thing easy?
No, we use foreign keys to enforce integrity. That they happen to make ERD diagrams easier to understand is simply a bonus.

Related

ForeignKey column comes from a lot of tables

So i have a table in which i have a column named parentKey. And this column has actually keys (which by definition are foreign keys) to MANY other tables (at least 4). And it seems strange to me to even create a column like this. I haven't yet seen a construction of a table that had this. Because you can't add a foreign key constraint since the column doesn't link to one single table. So i don't know is this is allowed to exist. I mean it's there it is created but i'm not sure if i should let it like this.
My idea is to create a column for each of the possible tables and name it correctly like : MyTable1Key, MyTable2Key and let them be foreign keys. But the problem with that is that if one of the foreign keys is assigned then the other ones will be null (And it will never be assigned so it will always stay null).
So do i have to let this parentKey column like it is or should i split it to different columns linked to tables by foreign keys and so have null values for some columns?
Unless you have a good reason, do not combine multiple foreign keys into a single column. As you've already noted it removes the referential integrity of your foreign key.
Either you will risk having a key which could belong to two tables or you have a master table somewhere that you should use as your foreign key reference. It is possible to have a primary key as a foreign key.
It sounds like you may be looking at the supertype-subtype pattern in which case this question might give you some good ideas. How do I apply subtypes into an SQL Server database?

Should foreign keys become table primary key?

I have a table (session_comments) with the following fields structure:
student_id (foreign key to students table)
session_id (foreign key to sessions table)
session_subject_ID (foreign key to session_subjects table)
user_id (foreign key to users table)
comment_date_time
comment
Now, the combination of student_id, session_id, and session_subject_id will uniquely identify a comment about that student for that session subject.
Given that combined they are unique, even though they are foreign keys, is there an advantage to me making them the combined primary key for that table?
Thanks again.
Making them the primary key will force uniqueness (as opposed to imply it).
The primary key will presumably be clustered (depending on the dbms) which will improve performance for some queries.
It saves the space of adding a unique constraint which in some DBMS also creates a unique index.
Whether you make these three the primary key or not, you will still need some sort of uniqueness constraint to guarantee that a student cannot be associated with the same session and session_subject_id twice. If that scenario is allowed, then you would need to expand your uniqueness constraint out to include another column.
No matter what choice you make, you should absolutely have some sort of uniqueness constraint on the table.
If you are debating as to whether to create a surrogate primary key + a unique constraint on the three columns, I would say that it depends on whether this table will have child tables. If it will, then referencing the surrogate key will be easier and smaller. If it will not, then IMO, the surrogate key does not really give you much and you might as well use the three columns as the PK.
It depends on the rest of the application.
If you're not going to have foreign keys to the comments table (which seems probable), this is fine.
If you will need to refer to comments from another table, you'd be better to create a unique index with your 3 fields, plus an AutoNumber primary key that will serve in other tables as the foreign key (much simpler and cheaper than the 3 fields).
The debate of natural vs artificial keys is as old as any database implementation.
Read about pro's and con's on wikipedia.
Arguments for the surrogate keys are easily disputed on theoretical level (for example argument that with natural keys you run the risk of your PK becoming non-unique can be counter-argumented with answer - good! if I run into that situation it is good that things would break instead of having artificially unique primary keys with duplicate records for actual data).
Another good argument is that artificial keys are either redundant (there is another unique key on the table) or they are allowing you to store essentially non-unique records.
Still, finding good natural keys is sometimes so hard that you must choose something artificial and allow for situation when you will have a person with a same name, born on same date (or with unknown date), with another xy properties that are same in value.
Also, it is not so clear what is artificial and what is natural.
You might say for example that SSN is natural for your data. Even though it is really composed number.
As for the performance of multi-key relationships - these are not as bad as you would think, furthermore - it segments the indices in a natural way and with such keys you usually end up with a database that performs really nicely with common queries without any additional indexes.
If you consider these problems seriously and if you are trying to build complex system, please read some good literature (C.J.Date Introduction to Database Systems, currently in 8th edition comes to mind)
I'd really recommend you use a primary key that's generated for you by your database of choice. Mainly because if you alter the structure of that table during any future maintainance then you run the risk of your unique key becoming non-unique. Which can be a really tough problem to sort out. Also having a unique primary key makes querying the table much, much easier.
Unique IDs for postgres: http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL
Unique IDs for Mysql: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
The only reason to make them into a composite primary key would be to enforce one comment per student/Session/Subject. Assuming you don't want to do that, I would not create another key.
No. FOREIGN keys can contain NULLs which are not allowed in PRIMARY keys. The best you can do is create a UNIQUE index from the columns.
Create a PRIMARY key on the table.
Response: My next question is:
Is there a possibility of overlap between the keys from the 4 tables?
These two would create the same composite key of 101010101:
student: 1010,session: 10,subject: 10,user: 1
student: 10,session: 1010,subject: 10,user: 1
I'm just pointing out that the four columns should have clearly different domains for the overlap to diminish in possibility.
Probably best to go with a true primary key.

Foreign Key Useful in SQLite?

I have two tables 'Elements' and 'Lists'
Lists has a primary key and a list name.
Elements has data pertaining to an individual entry in the list.
Elements needs a column that holds which list the element is in.
I've read about SQL's foreign key constraint and figure that is the best way to link the tables, but I'm using SQLite which doesn't enforce the foerign key constraint.
Is there a point to declaring the foreign key constraint if there is no enforcement?
It's always good to do, even if your database doesn't enforce the constraint (old MySQL, for instance). The reasoning for this, is that someday, someone will try reading your schema (perhaps even yourself).
If you can't use the new version, you can still declare the constraint and enforce it with triggers. In either case, I wouldn't omit the notation. It's far too helpful.
Nowadays sqlite enforces foreign keys, download the new release.
A foreign key is a field (or fields)
that points to the primary key of
another table. The purpose of the
foreign key is to ensure referential
integrity of the data. In other words,
only values that are supposed to
appear in the database are permitted.
It only enforces the "business rule". If you require this from the business side, then yes, it is required.
Indexing will not be affected.
You can still create indexes as requred.
Have a look at Foreign Key
and
Wikipedia Foreign key

Should a database table always have primary keys?

Should I always have a primary key in my database tables?
Let's take the SO tagging. You can see the tag in any revision, its likely to be in a tag_rev table with the postID and revision number. Would I need a PK for that?
Also since it is in a rev table and not currently use the tags should be a blob of tagIDs instead of multiple entries of multiple post_id tagid pair?
A table should have a primary key so that you could identify each row uniquely with it.
Technically, you can have tables without a primary key, but you'll be breaking good database design rules.
You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.
I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.
With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.
I tend to agree that most tables should have a primary key. I can only think of two times where it doesn't make sense to do it.
If you have a table that relates keys to other keys. For example, to relate a user_id to an answer_id, that table wouldn't need a primary key.
A logging table, whose only real purpose is to create an audit trail.
Basically, if you are writing a table that may ever need to be referenced in a foreign key relationship then a primary key is important, and if you can't be positive it won't be, then just add the PK. :)
See this related question about whether an integer primary key is required. One of the answers uses tagging as an example:
Are there any good reasons to have a database table without an integer primary key
For more discussion of tagging and keys, see this question:
Id for tags in tag systems
From MySQL 5.5 Reference Manual section 13.1.17:
If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.
So, technically, the answer is no. However, as others have stated, in most cases it is quite useful.
I firmly believe every table should have a way to uniquely identify a record. For 99% of the tables, this is a primary key. For the rest you may get away with a unique index (I'm thinking one column look up type tables here). Any time I have a had to work with a table without a way to uniquely identify records, there has been trouble.
I also believe if you are using surrogate keys as your PK, you should, where at all possible, have a separate unique index on whatever combination of fields make up the natural key. I realize there are all too many times when you don't have a true natural key (names are not unique or what makes something unique might be spread across several parentchild tables), but if you do have one, please please please make sure it has a unique index or is created as the PK.
If there is no PK, how will you update or delete a single row ? It would be impossible ! To be honest I have used a few times tables without PK, for instance to store activity logs, but even in this case it is advisable to have one because the timestamps could not be granular enough. Temporary tables is another example. But according to relational theory the PK is mandatory.
it is good to have keys and relationships . Helps a lot. however if your app is good enough to handle the relationships then you could possibly skip the keys ( although i recommend that you have them )
Since I use Subsonic, I always create a primary key for all of my tables. Many DB Abstraction libraries require a primary key to work.
Note: that doesn't answer the "Grand Unified Theory" tone of your question, but I'm just saying that in practice, sometimes you MUST make a primary key for every table.
If it's a join table then I wouldn't say that you need a primary key. Suppose, for example, that you have tables PERSONS, SICKPEOPLE, and ILLNESSES. The ILLNESSES table has things like flu, cold, etc., each with a primary key. PERSONS has the usual stuff about people, each also with a primary key. The SICKPEOPLE table only has people in it who are sick, and it has two columns, PERSONID and ILLNESSID, foreign keys back to their respective tables, and no primary key. The PERSONS and ILLNESSES tables contain entities and entities get primary keys. The entries in the SICKPEOPLE table aren't entities and don't get primary keys.
Databases don't have keys, per se, but their constituent tables might. I assume you mean that, but just in case...
Anyway, tables with a large number of rows should absolutely have primary keys; tables with only a few rows don't need them, necessarily, though they don't hurt. It depends upon the usage and the size of the table. Purists will put primary keys in every table. This is not wrong; and neither is omitting PKs in small tables.
Edited to add a link to my blog entry on this question, in which I discuss a case in which database administration staff did not consider it necessary to include a primary key in a particular table. I think this illustrates my point adequately.
Cyberherbalist's Blog Post on Primary Keys

One or Two Primary Keys in Many-to-Many Table?

I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?
You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.
Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.
Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.
I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.
What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.
Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.
I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.
Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.
The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

Resources