Choosing where to attach a foreign key between two tables?

Choosing where to attach a foreign key between two tables? - sql-server

I have a purchase order table and another table to contain the items within a particular purchase order for drugs.
Example:
PO_Table (POId, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, ...)
I have two options of choosing which table to link to which and they seem both valid. i have done this a number of times and have done it either way.
I would love to know if their are any rules to where to attach a foreign?
In my situation where do i attach my foreign key?
Update:
My two options are putting POItemID in the PO_Table or putting POId in the PO_Items_Table.
Update 2:
Assuming the relationship between the two tables is a one-to-one relationship

Just make it point to the PRIMARY KEY of the referenced table:
PO_Table (POId PRIMARY KEY, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, POId FOREIGN KEY REFERENCES PO_Table (POId), ...)
Actually, in your PO_Table I don't see any other candidate key except POId, so as for now this seems to be the only available solution to me.
What are the "two options" you are considering?
Update:
Putting POItemID in the PO_Table is not an option unless you want your orders to have no more than one item in them.
Just look into it: if you have but a single column which stores the id of the ordered item in the order table, where are you going to store the other items?
Update 2:
If there is a one-to-one relationship, normally you just merge the tables: combine all fields from both tables into a single record.
However, there are cases when you need to split the tables. Say, one of the entities is rarely defined but has too many fields.
In this case, you make a separate relation for the second entity and make its PRIMARY KEY column also a FOREIGN KEY.
Let's imagine a model which describes the locks and the keys, and the keys cannot be duplicated (so one lock matches at most one key and vice versa):
Pairs (PairID PRIMARY KEY, LockID UNIQUE, LockProductionDate, KeyId UNIQUE, KeyProductionDate)
If there is no key for a lock or no lock for a key, we just put NULLS into the corresponding fields.
However, if all keys have a lock but only few locks have keys, we can split the table:
Locks (LockID PRIMARY KEY, LockProductionDate, KeyID UNIQUE)
Keys (KeyID PRIMARY KEY, KeyProductionDate, FOREIGN KEY (KeyID) REFERENCES Locks (KeyID))
As you can see, the KeyID is both a PRIMARY KEY and a FOREIGN KEY in the Keys table.
You may want read this article in my blog:
What is entity-relationship model?
, which describes some ways to map ER model (entities and relationship) into the relational model (tables and foreign keys)

You don't have two options.. A Foreign Key constraint must be attached to the table, (and to the column) that has has the Foreign Key in it. And it must reference (or point to ) the Primary key in the other table. I don't quite understand what you mean when you say you have done this a number of times either way... What other Way ??

It looks like your PO_Table is the logical parent of the PO_Items_Table, which means the primary key of the PO_Table should be used as the Foreign Key in the items table

If PO stands for "Purchase Orders" and PO Item stands for a single line item of a purchase order, then you only have one choice about how to set up foreign keys. There may be many items for each purchase order, but there will only be one purchase order for each item. In this case, Quassnoi gave the correct design.
As a sidelight, every time I have designed a purchase order database, I have made the Items table have a compound primary key made up of POID and ItemID. But ItemID is not unique among all Items, just the items that belong to a single PO. Each time I start a new PO, I begin all over again with ItemID equal to one. This permits me to reconstruct a purchase order later on, and get the items in the same order as they were in when the order was first created. This is a trivial matter for most data processing purposes, but it can drive customers nuts if they look atr a PO later on, and the items are out of sequence, as they perceive sequence.

Related

Unique constraint to combination of two columns unclustered index

I'm not asking HOW to do this, but if it's what I SHOULD be doing.
Two employees can be working on the same job. So of course, both FKs, EmployeeID and JobID, can have a MANY relationship in a "Employee_Jobs" table.
Let's take Employee A, Employee B, Job A and Job B. All of the following would be acceptable:
A A
A B
B A
B B
What would NOT be acceptable is a duplicate of a combination of these two PKs... since we cannot have for example, [Employee A working on Job A] twice.
So would it be correct to say that the only way to manage this is to make the combination of the two PKs, EmployeeID and JobID, a Unique, non-clustered index?
I tried to think of how to instead, break this up to more tables but I keep getting back to this same problem.

Yes, not only is it appropriate, but in fact, the combination of these two attributes should be the PRIMARY KEY.
and in any other table where the entity represented by rows in the table has a logical attribute (consisting of the two columns employeeId and JobId), which represents the work done by an employee on a job, (or the contribution of the employee to a job, or the association of an employee to a job in any way), a FK in that table should be a composite Foreign Key consisting of these same two columns.
If you are using a surrogate key on this table to simplify joins and definition of Foreign Keys in other tables, then by all means continue to do so, but keep the two-column natural key in this table, as either a unique index or a Alternate Key. (a Key is a Key - anything that is declared or defined to be unique) so as to ensure data integrity in this table. In fact, to make it clear to users of the schema, when this situation comes up, I generally make the composite Natural Key the PRIMARY KEY, and add/define the surrogate (which is used in Joins and Other table FKs), as an alternate key or unique index. This is pretty much only a semantic distinction, only as they create almost identical functionality. But because data integrity is more important to me than join syntax and Foreign Key structure, To me, the Natural Key is the PRIMARY key,

Yes, In that case you should consider making both those fields as primary key; in specific a composite primary key or compound primary key like below which will make sure uniqueness of combination of both the fields.
primary key (EmployeeID , JobID)
Though as you said a Unique, non-clustered index but marking both the field as primary key will create a UNIQUE Clustered Index on them actually.

Why does my database table need a primary key?

In my database I have a list of users with information about them, and I also have a feature which allows a user to add other users to a shortlist. My user information is stored in one table with a primary key of the user id, and I have another table for the shortlist. The shortlist table is designed so that it has two columns and is basically just a list of pairs of names. So to find the shortlist for a particular user you retrieve all names from the second column where the id in the first column is a particular value.
The issue is that according to many sources such as this Should each and every table have a primary key? you should have a primary key in every table of the database.
According to this source http://www.w3schools.com/sql/sql_primarykey.asp - a primary key in one which uniquely identifies an entry in a database. So my question is:
What is wrong with the table in my database? Why does it need a primary key?
How should I give it a primary key? Just create a new auto-incrementing column so that each entry has a unique id? There doesn't seem much point for this. Or would I somehow encapsulate the multiple entries that represent a shortlist into another entity in another table and link that in? I'm really confused.

If the rows are unique, you can have a two-column primary key, although maybe that's database dependent. Here's an example:
CREATE TABLE my_table
(
col_1 int NOT NULL,
col_2 varchar(255) NOT NULL,
CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
)
If you already have the table, the example would be:
ALTER TABLE my_table
ADD CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)

Primary keys must identify each record uniquely and as it was mentioned before, primary keys can consist of multiple attributes (1 or more columns). First, I'd recommend making sure each record is really unique in your table. Secondly, as I understand you left the table without primary key and that's disallowed so yes, you will need to set the key for it.

In this particular case, there is no purpose in same pair of user IDs being stored more than once in the shortlist table. After all, that table models a set, and an element is either in the set or isn't. Having an element "twice" in the set makes no sense1. To prevent that, create a composite key, consisting of these two user ID fields.
Whether this composite key will also be primary, or you'll have another key (that would act as surrogate primary key) is another matter, but either way you'll need this composite key.
Please note that under databases that support clustering (aka. index-organized tables), PK is often also a clustering key, which may have significant repercussions on performance.
1 Unlike in mutiset.

A table with duplicate rows is not an adequate representation of a relation. It's a bag of rows, not a set of rows. If you let this happen, you'll eventually find that your counts will be off, your sums will be off, and your averages will be off. In short, you'll get confusing errors out of your data when you go to use it.
Declaring a primary key is a convenient way of preventing duplicate rows from getting into the database, even if one of the application programs makes a mistake. The index you obtain is a side effect.
Foreign key references to a single row in a table could be made by referencing any candidate key. However, it's much more convenient if you declare one of those candidate keys as a primary key, and then make all foreign key references refer to the primary key. It's just careful data management.
The one-to-one correspondence between entities in the real world and corresponding rows in the table for that entity is beyond the realm of the DBMS. It's up to your applications and even your data providers to maintain that correspondence by not inventing new rows for existing entities and not letting some new entities slip through the cracks.

Well since you are asking, it's good practice but in a few instances (no joins needed to the data) it may not be absolutely required. The biggest problem though is you never really know if requirements will change and so you really want one now so you aren't adding one to a 10m record table after the fact.....
In addition to a primary key (which can span multiple columns btw) I think it is good practice to have a secondary candidate key which is a single field. This makes joins easier.
First some theory. You may remember the definition of a function from HS or college algebra is that y = f(x) where f is a function if and only if for every x there is exactly one y. In this case, in relational math we would say that y is functionally dependent on x on this case.
The same is true of your data. Suppose we are storing check numbers, checking account numbers, and amounts. Assuming that we may have several checking accounts and that for each checking account duplicate check numbers are not allowed, then amount is functionally dependent on (account, check_number). In general you want to store data together which is functionally dependent on the same thing, with no transitive dependencies. A primary key will typically be the functional dependency you specify as the primary one. This then identifies the rest of the data in the row (because it is tied to that identifier). Think of this as the natural primary key. Where possible (i.e. not using MySQL) I like to declare the primary key to be the natural one, even if it spans across columns. This gets complicated sometimes where you may have multiple interchangeable candidate keys. For example, consider:
CREATE TABLE country (
id serial not null unique,
name text primary key,
short_name text not null unique
);
This table really could have any column be the primary key. All three are perfectly acceptable candidate keys. Suppose we have a country record (232, 'United States', 'US'). Each of these fields uniquely identifies the record so if we know one we can know the others. Each one could be defined as the primary key.
I also recommend having a second, artificial candidate key which is just a machine identifier used for linking for joins. In the above example country.id does this. This can be useful for linking other records to the country table.
An exception to needing a candidate key might be where duplicate records really are possible. For example, suppose we are tracking invoices. We may have a case where someone is invoiced independently for two items with one showing on each of two line items. These could be identical. In this case you probably want to add an artificial primary key because it allows you to join things to that record later. You might not have a need to do so now but you may in the future!

Create a composite primary key.
To read more about what a composite primary key is, visit
http://www.relationaldbdesign.com/relational-database-analysis/module2/concatenated-primary-keys.php

Is it fine to have foreign key as primary key?

I have two tables:
User (username, password)
Profile (profileId, gender, dateofbirth, ...)
Currently I'm using this approach: each Profile record has a field named "userId" as foreign key which links to the User table. When a user registers, his Profile record is automatically created.
I'm confused with my friend suggestion: to have the "userId" field as the foreign and primary key and delete the "profileId" field. Which approach is better?

Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
Instead, find a field that uniquely identifies each record in the table, or add a new field (either an auto-incrementing integer or a GUID) to act as the primary key.
The only exception to this are tables with a one-to-one relationship, where the foreign key and primary key of the linked table are one and the same.

Primary keys always need to be unique, foreign keys need to allow non-unique values if the table is a one-to-many relationship. It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship. If you want the same user record to have the possibility of having more than 1 related profile record, go with a separate primary key, otherwise stick with what you have.

Yes, it is legal to have a primary key being a foreign key. This is a rare construct, but it applies for:
a 1:1 relation. The two tables cannot be merged in one because of different permissions and privileges only apply at table level (as of 2017, such a database would be odd).
a 1:0..1 relation. Profile may or may not exist, depending on the user type.
performance is an issue, and the design acts as a partition: the profile table is rarely accessed, hosted on a separate disk or has a different sharding policy as compared to the users table. Would not make sense if the underlining storage is columnar.

Yes, a foreign key can be a primary key in the case of one to one relationship between those tables

I would not do that. I would keep the profileID as primary key of the table Profile
A foreign key is just a referential constraint between two tables
One could argue that a primary key is necessary as the target of any foreign keys which refer to it from other tables. A foreign key is a set of one or more columns in any table (not necessarily a candidate key, let alone the primary key, of that table) which may hold the value(s) found in the primary key column(s) of some other table. So we must have a primary key to match the foreign key.
Or must we? The only purpose of the primary key in the primary key/foreign key pair is to provide an unambiguous join - to maintain referential integrity with respect to the "foreign" table which holds the referenced primary key. This insures that the value to which the foreign key refers will always be valid (or null, if allowed).
http://www.aisintl.com/case/primary_and_foreign_key.html

It is generally considered bad practise to have a one to one relationship. This is because you could just have the data represented in one table and achieve the same result.
However, there are instances where you may not be able to make these changes to the table you are referencing. In this instance there is no problem using the Foreign key as the primary key. It might help to have a composite key consisting of an auto incrementing unique primary key and the foreign key.
I am currently working on a system where users can log in and generate a registration code to use with an app. For reasons I won't go into I am unable to simply add the columns required to the users table. So I am going down a one to one route with the codes table.

It depends on the business and system.
If your userId is unique and will be unique all the time, you can use userId as your primary key. But if you ever want to expand your system, it will make things difficult. I advise you to add a foreign key in table user to make a relationship with table profile instead of adding a foreign key in table profile.

Short answer: DEPENDS.... In this particular case, it might be fine. However, experts will recommend against it just about every time; including your case.
Why?
Keys are seldomly unique in tables when they are foreign (originated in another table) to the table in question. For example, an item ID might be unique in an ITEMS table, but not in an ORDERS table, since the same type of item will most likely exist in another order. Likewise, order IDs might be unique (might) in the ORDERS table, but not in some other table like ORDER_DETAILS where an order with multiple line items can exist and to query against a particular item in a particular order, you need the concatenation of two FK (order_id and item_id) as the PK for this table.
I am not DB expert, but if you can justify logically to have an auto-generated value as your PK, I would do that. If this is not practical, then a concatenation of two (or maybe more) FK could serve as your PK. BUT, I cannot think of any case where a single FK value can be justified as the PK.

It is not totally applied for the question's case, but since I ended up on this question serching for other info and by reading some comments, I can say it is possible to only have a FK in a table and get unique values.
You can use a column that have classes, which can only be assigned 1 time, it works almost like and ID, however it could be done in the case you want to use a unique categorical value that distinguish each record.

2 primary keys in a table

Making a primary key in a table in database is fine. Making a Composite Primary is also fine. But why cant I have 2 primary keys in a table? What kind of problems may occur if we have 2 primary keys.
Suppose I have a Students table. I don't want Roll No. and Names of each student to be unique. Then why can't I create 2 primary keys in a table? I don't see any logical problem in it now. But definitely I am missing a serious issue that's the reason it does not exist.
I am new in databases, so don't have much idea. It may also create a technical issue rather. Will be happy if someone can educate me on this.
Thanks.

You can create a UNIQUE constraint for both columns UNIQUE(roll,name).

The PK is unique by definition, cause it is used to identify a row from the others, for example, when a foreign key references that table, it is referencing the PK.
If you need another column to 'act' like a PK, give it the attributes unique and not null.

Well, this is simply by definition. There can not be two "primary" conditions, just like there can not be two "latest" versions.
Every table can contain more than one unique keys, but if you decide to have a primary key, this is just one of these unique keys, the "one" you deem the "most important", which identifies every record uniquely.
If you have a table and come to the conclusion that your primary key does not uniquely identify each record (also meaning that there can't be two records with the same values for the primary key), you have chosen the wrong primary key, as by definition, the fields of the primary key must uniquely define each record.
That, however, does not mean there can be no other combination of fields uniquely identifying the record! This is where a second feature kicks in: referential integrity.
You can "link" tables using their primary key. For example: If you have a Customer table and an Orders table, where the Customers table has a primary key on the customer number and the Orders table has a primary key on the order number and the customer number, that means:
Every customer can be identified uniquely by his customer number
Every order is uniquely identified by the order number and the customer number
You can then link the two tables on the customer number. The DB system then ensures several things, among which is the fact that you can not remove a customer who has orders in your database without first removing the orders. Otherwise, you would have orders without being able to find out the customer data, which would violate your database's referential integrity.
If you had two primary keys, the system would not know on which to ensure referential integrity, so you'd have to tell the system which key to use - which would make one of the primary keys more important, which would make it the "primary key" (!) of the primary keys.

You can have multiple candidate keys in a table but by convention only one key per table is called "primary". That's just a convention though and it doesn't make any real difference to the function of the keys. A primary key is no different to any other candidate key. If you find it convenient to call more than one key "primary" then I suggest you do so. In my opinion (I'm not the only one) the idea of designating a "primary" key at all is essentially an outdated concept of very little importance in database design.
You might be interested to know that early papers on the relational database model (e.g. by E.F.Codd, the relational model's inventor) actually used the term "primary key" to describe all the keys of a relation and not just one. So there is a perfectly good precedent for multiple primary keys per table. The idea of designating exactly one primary key is more recent and probably came into common use through the popularity of ER modelling techniques.

Create an unique index on the 2nd attribute (Names), it's almost the same as primary key with another name.
From Wikipedia (http://en.wikipedia.org/wiki/Unique_key):
A table can have at most one primary key, but more than one unique
key. A primary key is a combination of columns which uniquely specify
a row. It is a special case of unique keys. One difference is that
primary keys have an implicit NOT NULL constraint while unique keys do
not. Thus, the values in unique key columns may or may not be NULL,
and in fact such a column may contain at most one NULL fields.
Another difference is that primary keys must be defined using another
syntax.

One or Two Primary Keys in Many-to-Many Table?

I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?

You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.

Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.

Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.

I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.

What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.

Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.

I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.

Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.

The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight