Tables with a common primary key - database

What's the term describing the relationship between tables that share a common primary key?
Here's an example:
Table 1
property(property_id, property_location, property_price, ...);
Table 2
flat(property_id, flat_floor, flat_bedroom_count, ...);

What you have looks like table inheritance. If your table structure is that all flat records represent a single property but not all property records refer to a flat, then that's table inheritance. It's a way of modeling something close to object-oriented relationships (in other words, flat inherits from property) in a relational database.

If I understand your example correctly, the data modeling term is Supertype/Subtype. This is a modeling technique where you define a root table (the supertype) containing common attributes, and one or more referencing tables (subtypes) that contain varying attributes based on the entities being modeled.
For example, you could have a Person table (the supertype) containing columns for attributes pertaining to all people, such as Name. You could then have an Employee table (the subtype) containing attributes specific to employees only, such as rate of pay and hire date. You could then continue this process with additional tables for other specializations of Person, such as Contractor. Each of the subtype tables would have a PersonID key column, which could be the primary key of the subtype table, as well as a foreign key referencing the Person table.
For additional info, search Google for "supertype and subtype entities", and see the links below.
http://www.learndatamodeling.com/dm_super_type.htm
http://technet.microsoft.com/en-us/library/cc505839.aspx

There isn't a good name for this relationship in common database terminology (as far as I know). It's not a one-to-one relationship because there isn't guaranteed to be a record in the "extending" table for each record in the main table. It's not a one-to-many relationship because there a maximum of one record allowed on what would otherwise be the "many" side of the relationship.
The best I can do is a one-to-one-or-none or a one-to-one-at-most relationship. (I will admit to sloppy terminology myself — I just call it a one-to-one relationship.)
Whatever you decide to call it, you can model it properly and maintain integrity in your database by making the property_id column in property a PK and the property_id column in flat a PK and also an FK back to property.

"Logic and Databases" advances the term "at most one to at most one" for this kind of relationship. (Note that it is insane to assign names to tables on account of which relationships they participate in.)
Beware of the people who have suggested things like "foreign key", "table inheritance", brief, all the other answers given here. Those people are making assumptions that you have not explicitly stated to be valid, namely that one of your two tables will be guaranteed to contain all key values that appear in the other.

(Disfunctionality of the site prevents me from adding this as a comment in the proper place.)
"How would you interpret "...that share a common primary key?" "
I interpret that in the only reasonable sense possible: that within table1, attribute values for the primary key are guaranteed to be unique, and that within table2, attribute values for the primary key are guaranteed to be unique. And that furthermore, the primary key in both tables has the same [set of] attribute names, and that the types corresponding to the primary key attribute[s] are also pairwise the same. Nothing more and nothing less.
In particular, "sharing a primary key" means "having a primary key in common", and that means in turn "having a certain 'internal uniqueness rule' in common", but that commonality guarantees in no way that a primary key value appearing in one table must also appear in the second table.
"Can you give an example involving two tables with shared primary keys where one table wouldn't contain all the key values that appear in the other?" "
Table1: column A of type integer, primary key A
Table2: column A of type integer, primary key A
Rows in table1: {A:1}. Satisfies the primary key for table1.
Rows in table2: {A:2}. Satisfies the primary key for table2.
Convinced ?

"Foreign key"?

Related

Normalization and primary keys

In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
If its zero(0NF) adding a new column and making it primary key will convert this table to 1NF ?
Normal forms apply to relations, which are mathematical structures. Tables can be used to represent relations, but this requires some rules to ensure that the table doesn't contain more or less information than the corresponding relation.
In order for a table to represent a relation:
all rows and columns must be unique
the order they're in mustn't matter
all significant information must be represented as values in cells (i.e. fonts, highlighting, etc, mustn't matter)
every cell must contain one value (doesn't matter how simple or complex that value is)
Also, the relational model cares about candidate keys, not primary keys. A relation can have multiple candidate keys. A primary key is just a selected candidate key that is used by some disciplines (e.g. the entity-relationship model) or by some database management systems (e.g. for physical record ordering).
With all that said, I can now answer your question. If your table follows the rules and specifically the rows are all unique, then there will be at least one candidate key, on all the columns together at worst. If your table's rows aren't unique, then the table doesn't represent a relation and the normal forms don't apply. A surrogate key (like an auto-increment column) can be added to identify rows uniquely, but that isn't necessarily sufficient on its own to make a table represent a relation (1NF).
BTW, I suggest you avoid using "0NF" or "UNF". Non-relational tables don't have a level of normalization, so attaching any kind of "NF" to them is misleading.
As long as you are talking about tables, there is one further case that needs to be covered. It's the case of duplicate rows.
Duplicate rows are rows that are identical in appearance but not in row number. Such a table cannot have a primary key. Sometimes duplicate rows represent the same information. Sometimes not.
For example, consider a table with just four columns: customerid, productid, quentity, price. If a customer orders the same product twice, we'll have two identical rows, representing different inforation. Ths is not good.
Note that the corresonding thing cannot happen with relations. If two tuples in a relation have the same appearance, then they are the same tuple.
As to the other points, they are covered by excellent earlier answers.
before you wan to check for normalization your table must have a Primary key(the primary key is playing lead role in Relational DB,...).
1NF: says that all of your table attributes must be single valued.
Answer of Question 1 : In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
Answer : If it is no primary key in relation and if it is impossible to create a composite primiary key(According to me your question says ,even if combine all the column of row to make candidate key then also it will not able to identify your relationship uniquly(duplicate rows are there), hence it is not in any normal form.
Answer of Question 2:
If you add some column(having unique values in it) and if all the cell contains only one value then it is in 1NF.
Still if you need some clarification can ask in comment box.
0NF is not any form of normalization. refer C.J. Date or Henry korth(database management system book)
Hope this helps.

Why does my database table need a primary key?

In my database I have a list of users with information about them, and I also have a feature which allows a user to add other users to a shortlist. My user information is stored in one table with a primary key of the user id, and I have another table for the shortlist. The shortlist table is designed so that it has two columns and is basically just a list of pairs of names. So to find the shortlist for a particular user you retrieve all names from the second column where the id in the first column is a particular value.
The issue is that according to many sources such as this Should each and every table have a primary key? you should have a primary key in every table of the database.
According to this source http://www.w3schools.com/sql/sql_primarykey.asp - a primary key in one which uniquely identifies an entry in a database. So my question is:
What is wrong with the table in my database? Why does it need a primary key?
How should I give it a primary key? Just create a new auto-incrementing column so that each entry has a unique id? There doesn't seem much point for this. Or would I somehow encapsulate the multiple entries that represent a shortlist into another entity in another table and link that in? I'm really confused.
If the rows are unique, you can have a two-column primary key, although maybe that's database dependent. Here's an example:
CREATE TABLE my_table
(
col_1 int NOT NULL,
col_2 varchar(255) NOT NULL,
CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
)
If you already have the table, the example would be:
ALTER TABLE my_table
ADD CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
Primary keys must identify each record uniquely and as it was mentioned before, primary keys can consist of multiple attributes (1 or more columns). First, I'd recommend making sure each record is really unique in your table. Secondly, as I understand you left the table without primary key and that's disallowed so yes, you will need to set the key for it.
In this particular case, there is no purpose in same pair of user IDs being stored more than once in the shortlist table. After all, that table models a set, and an element is either in the set or isn't. Having an element "twice" in the set makes no sense1. To prevent that, create a composite key, consisting of these two user ID fields.
Whether this composite key will also be primary, or you'll have another key (that would act as surrogate primary key) is another matter, but either way you'll need this composite key.
Please note that under databases that support clustering (aka. index-organized tables), PK is often also a clustering key, which may have significant repercussions on performance.
1 Unlike in mutiset.
A table with duplicate rows is not an adequate representation of a relation. It's a bag of rows, not a set of rows. If you let this happen, you'll eventually find that your counts will be off, your sums will be off, and your averages will be off. In short, you'll get confusing errors out of your data when you go to use it.
Declaring a primary key is a convenient way of preventing duplicate rows from getting into the database, even if one of the application programs makes a mistake. The index you obtain is a side effect.
Foreign key references to a single row in a table could be made by referencing any candidate key. However, it's much more convenient if you declare one of those candidate keys as a primary key, and then make all foreign key references refer to the primary key. It's just careful data management.
The one-to-one correspondence between entities in the real world and corresponding rows in the table for that entity is beyond the realm of the DBMS. It's up to your applications and even your data providers to maintain that correspondence by not inventing new rows for existing entities and not letting some new entities slip through the cracks.
Well since you are asking, it's good practice but in a few instances (no joins needed to the data) it may not be absolutely required. The biggest problem though is you never really know if requirements will change and so you really want one now so you aren't adding one to a 10m record table after the fact.....
In addition to a primary key (which can span multiple columns btw) I think it is good practice to have a secondary candidate key which is a single field. This makes joins easier.
First some theory. You may remember the definition of a function from HS or college algebra is that y = f(x) where f is a function if and only if for every x there is exactly one y. In this case, in relational math we would say that y is functionally dependent on x on this case.
The same is true of your data. Suppose we are storing check numbers, checking account numbers, and amounts. Assuming that we may have several checking accounts and that for each checking account duplicate check numbers are not allowed, then amount is functionally dependent on (account, check_number). In general you want to store data together which is functionally dependent on the same thing, with no transitive dependencies. A primary key will typically be the functional dependency you specify as the primary one. This then identifies the rest of the data in the row (because it is tied to that identifier). Think of this as the natural primary key. Where possible (i.e. not using MySQL) I like to declare the primary key to be the natural one, even if it spans across columns. This gets complicated sometimes where you may have multiple interchangeable candidate keys. For example, consider:
CREATE TABLE country (
id serial not null unique,
name text primary key,
short_name text not null unique
);
This table really could have any column be the primary key. All three are perfectly acceptable candidate keys. Suppose we have a country record (232, 'United States', 'US'). Each of these fields uniquely identifies the record so if we know one we can know the others. Each one could be defined as the primary key.
I also recommend having a second, artificial candidate key which is just a machine identifier used for linking for joins. In the above example country.id does this. This can be useful for linking other records to the country table.
An exception to needing a candidate key might be where duplicate records really are possible. For example, suppose we are tracking invoices. We may have a case where someone is invoiced independently for two items with one showing on each of two line items. These could be identical. In this case you probably want to add an artificial primary key because it allows you to join things to that record later. You might not have a need to do so now but you may in the future!
Create a composite primary key.
To read more about what a composite primary key is, visit
http://www.relationaldbdesign.com/relational-database-analysis/module2/concatenated-primary-keys.php

2 primary keys in a table

Making a primary key in a table in database is fine. Making a Composite Primary is also fine. But why cant I have 2 primary keys in a table? What kind of problems may occur if we have 2 primary keys.
Suppose I have a Students table. I don't want Roll No. and Names of each student to be unique. Then why can't I create 2 primary keys in a table? I don't see any logical problem in it now. But definitely I am missing a serious issue that's the reason it does not exist.
I am new in databases, so don't have much idea. It may also create a technical issue rather. Will be happy if someone can educate me on this.
Thanks.
You can create a UNIQUE constraint for both columns UNIQUE(roll,name).
The PK is unique by definition, cause it is used to identify a row from the others, for example, when a foreign key references that table, it is referencing the PK.
If you need another column to 'act' like a PK, give it the attributes unique and not null.
Well, this is simply by definition. There can not be two "primary" conditions, just like there can not be two "latest" versions.
Every table can contain more than one unique keys, but if you decide to have a primary key, this is just one of these unique keys, the "one" you deem the "most important", which identifies every record uniquely.
If you have a table and come to the conclusion that your primary key does not uniquely identify each record (also meaning that there can't be two records with the same values for the primary key), you have chosen the wrong primary key, as by definition, the fields of the primary key must uniquely define each record.
That, however, does not mean there can be no other combination of fields uniquely identifying the record! This is where a second feature kicks in: referential integrity.
You can "link" tables using their primary key. For example: If you have a Customer table and an Orders table, where the Customers table has a primary key on the customer number and the Orders table has a primary key on the order number and the customer number, that means:
Every customer can be identified uniquely by his customer number
Every order is uniquely identified by the order number and the customer number
You can then link the two tables on the customer number. The DB system then ensures several things, among which is the fact that you can not remove a customer who has orders in your database without first removing the orders. Otherwise, you would have orders without being able to find out the customer data, which would violate your database's referential integrity.
If you had two primary keys, the system would not know on which to ensure referential integrity, so you'd have to tell the system which key to use - which would make one of the primary keys more important, which would make it the "primary key" (!) of the primary keys.
You can have multiple candidate keys in a table but by convention only one key per table is called "primary". That's just a convention though and it doesn't make any real difference to the function of the keys. A primary key is no different to any other candidate key. If you find it convenient to call more than one key "primary" then I suggest you do so. In my opinion (I'm not the only one) the idea of designating a "primary" key at all is essentially an outdated concept of very little importance in database design.
You might be interested to know that early papers on the relational database model (e.g. by E.F.Codd, the relational model's inventor) actually used the term "primary key" to describe all the keys of a relation and not just one. So there is a perfectly good precedent for multiple primary keys per table. The idea of designating exactly one primary key is more recent and probably came into common use through the popularity of ER modelling techniques.
Create an unique index on the 2nd attribute (Names), it's almost the same as primary key with another name.
From Wikipedia (http://en.wikipedia.org/wiki/Unique_key):
A table can have at most one primary key, but more than one unique
key. A primary key is a combination of columns which uniquely specify
a row. It is a special case of unique keys. One difference is that
primary keys have an implicit NOT NULL constraint while unique keys do
not. Thus, the values in unique key columns may or may not be NULL,
and in fact such a column may contain at most one NULL fields.
Another difference is that primary keys must be defined using another
syntax.

Is there an answer matrix I can use to decide if I need a foreign key or not?

For example, I have a table that stores classes, and a table that stores class_attributes. class_attributes has a class_attribute_id and a class_id, while classes has a class_id.
I'd guess if a dataset is "a solely child of" or "belongs solely to" or "is solely owned by", then I need a FK to identify the parent. Without class_id in the class_attributes table I could never find out to which class this attribute belongs to.
Maybe there's an helpful answer matrix for this?
Wikipedia is helpful.
In the context of relational
databases, a foreign key is a
referential constraint between two
tables.1 The foreign key identifies
a column or a set of columns in one
(referencing) table that refers to a
column or set of columns in another
(referenced) table. The columns in the
referencing table must be the primary
key or other candidate key in the
referenced table.
(and it goes on into more and more detail)
If you want to enforce the constraint that each row in class_attributes applies to exactly one row of classes, you need a foreign key. If you don't care about enforcing this (ie, you're fine to have attributes for non-existent classes), you don't need an FK.
I don't have an answer matrix, but just for clarification purposes, we're talking about Database Normalization:
http://en.wikipedia.org/wiki/Database_normalization
And to a certain extent Denormalization:
http://en.wikipedia.org/wiki/Denormalization
I would say, it's the other way around. First, you design what kind of objects you need to have. For those will create a table.
Part of this phase is designing the keys, that is the combinations of attributes (columns) that uniquely identify the object. You may or may not add an artificial key or surrogate key for convenience or performance reasons. From these keys, you typically elect one canonical key, the primary key, which you try to use consistently to identify objects in that table (you keep the other keys too, they serve to ensure unicity as a business rule, not so much for identificattion purposes.)
Then, you think what relationships exist between the objects. An object that is 'owned' by another object, or an object that refers to another object needs some way to identify its related object. In the corresponding table (child table) you add columns to make a foreign key to point to the primary key of the referenced table.
This takes care of all one to many relationships.
Sometimes, an object can be related multiple times to another object. For example, an order can be used to order multiple products, but a product can appear on multiple orders as well. For those relationships, you design a separate table (intersection table - in this example, order_items). This table will have a unique key created from two foreign keys: one pointing to the one parent (orders), one to the other parent (products). And again, you add the columns to the intersection table that you need to create those foreign keys.
So in short, you first design keys and foreign keys, only then you start adding columns to implement them.
Don't be concerned with the type of relationship -- it has more to do with the cardinality of the relationship.
If you have a one-to-many relationship, then you'd want to assign a Primary Key to the smaller of the tables, and store it as a Foreign Key in the larger table.
You'd also do it with one-to-one relationships, but some people argue that you should avoid them.
In the case of a many-to-many relationship, you'd want to make a join table, and then have each of the original tables have a foreign key to the join table.

One or Two Primary Keys in Many-to-Many Table?

I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?
You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.
Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.
Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.
I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.
What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.
Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.
I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.
Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.
The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

Resources