Can I have multiple primary keys in a single table?
A Table can have a Composite Primary Key which is a primary key made from two or more columns. For example:
CREATE TABLE userdata (
userid INT,
userdataid INT,
info char(200),
primary key (userid, userdataid)
);
Update: Here is a link with a more detailed description of composite primary keys.
You can only have one primary key, but you can have multiple columns in your primary key.
You can also have Unique Indexes on your table, which will work a bit like a primary key in that they will enforce unique values, and will speed up querying of those values.
A table can have multiple candidate keys. Each candidate key is a column or set of columns that are UNIQUE, taken together, and also NOT NULL. Thus, specifying values for all the columns of any candidate key is enough to determine that there is one row that meets the criteria, or no rows at all.
Candidate keys are a fundamental concept in the relational data model.
It's common practice, if multiple keys are present in one table, to designate one of the candidate keys as the primary key. It's also common practice to cause any foreign keys to the table to reference the primary key, rather than any other candidate key.
I recommend these practices, but there is nothing in the relational model that requires selecting a primary key among the candidate keys.
This is the answer for both the main question and for #Kalmi's question of
What would be the point of having multiple auto-generating columns?
This code below has a composite primary key. One of its columns is auto-incremented. This will work only in MyISAM. InnoDB will generate an error "ERROR 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key".
DROP TABLE IF EXISTS `test`.`animals`;
CREATE TABLE `test`.`animals` (
`grp` char(30) NOT NULL,
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`name` char(30) NOT NULL,
PRIMARY KEY (`grp`,`id`)
) ENGINE=MyISAM;
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id;
Which returns:
+--------+----+---------+
| grp | id | name |
+--------+----+---------+
| fish | 1 | lax |
| mammal | 1 | dog |
| mammal | 2 | cat |
| mammal | 3 | whale |
| bird | 1 | penguin |
| bird | 2 | ostrich |
+--------+----+---------+
(Have been studying these, a lot)
Candidate keys - A minimal column combination required to uniquely identify a table row.
Compound keys - 2 or more columns.
Multiple Candidate keys can exist in a table.
Primary KEY - Only one of the candidate keys that is chosen by us
Alternate keys - All other candidate keys
Both Primary Key & Alternate keys can be Compound keys
Sources:
https://en.wikipedia.org/wiki/Superkey
https://en.wikipedia.org/wiki/Candidate_key
https://en.wikipedia.org/wiki/Primary_key
https://en.wikipedia.org/wiki/Compound_key
As noted by the others it is possible to have multi-column primary keys.
It should be noted however that if you have some functional dependencies that are not introduced by a key, you should consider normalizing your relation.
Example:
Person(id, name, email, street, zip_code, area)
There can be a functional dependency between id -> name,email, street, zip_code and area
But often a zip_code is associated with a area and thus there is an internal functional dependecy between zip_code -> area.
Thus one may consider splitting it into another table:
Person(id, name, email, street, zip_code)
Area(zip_code, name)
So that it is consistent with the third normal form.
Primary Key is very unfortunate notation, because of the connotation of "Primary" and the subconscious association in consequence with the Logical Model. I thus avoid using it. Instead I refer to the Surrogate Key of the Physical Model and the Natural Key(s) of the Logical Model.
It is important that the Logical Model for every Entity have at least one set of "business attributes" which comprise a Key for the entity. Boyce, Codd, Date et al refer to these in the Relational Model as Candidate Keys. When we then build tables for these Entities their Candidate Keys become Natural Keys in those tables. It is only through those Natural Keys that users are able to uniquely identify rows in the tables; as surrogate keys should always be hidden from users. This is because Surrogate Keys have no business meaning.
However the Physical Model for our tables will in many instances be inefficient without a Surrogate Key. Recall that non-covered columns for a non-clustered index can only be found (in general) through a Key Lookup into the clustered index (ignore tables implemented as heaps for a moment). When our available Natural Key(s) are wide this (1) widens the width of our non-clustered leaf nodes, increasing storage requirements and read accesses for seeks and scans of that non-clustered index; and (2) reduces fan-out from our clustered index increasing index height and index size, again increasing reads and storage requirements for our clustered indexes; and (3) increases cache requirements for our clustered indexes. chasing other indexes and data out of cache.
This is where a small Surrogate Key, designated to the RDBMS as "the Primary Key" proves beneficial. When set as the clustering key, so as to be used for key lookups into the clustered index from non-clustered indexes and foreign key lookups from related tables, all these disadvantages disappear. Our clustered index fan-outs increase again to reduce clustered index height and size, reduce cache load for our clustered indexes, decrease reads when accessing data through any mechanism (whether index scan, index seek, non-clustered key lookup or foreign key lookup) and decrease storage requirements for both clustered and nonclustered indexes of our tables.
Note that these benefits only occur when the surrogate key is both small and the clustering key. If a GUID is used as the clustering key the situation will often be worse than if the smallest available Natural Key had been used. If the table is organized as a heap then the 8-byte (heap) RowID will be used for key lookups, which is better than a 16-byte GUID but less performant than a 4-byte integer.
If a GUID must be used due to business constraints than the search for a better clustering key is worthwhile. If for example a small site identifier and 4-byte "site-sequence-number" is feasible then that design might give better performance than a GUID as Surrogate Key.
If the consequences of a heap (hash join perhaps) make that the preferred storage then the costs of a wider clustering key need to be balanced into the trade-off analysis.
Consider this example::
ALTER TABLE Persons
ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
where the tuple "(P_Id,LastName)" requires a uniqueness constraint, and may be a lengthy Unicode LastName plus a 4-byte integer, it would be desirable to (1) declaratively enforce this constraint as "ADD CONSTRAINT pk_PersonID UNIQUE NONCLUSTERED (P_Id,LastName)" and (2) separately declare a small Surrogate Key to be the "Primary Key" of a clustered index. It is worth noting that Anita possibly only wishes to add the LastName to this constraint in order to make that a covered field, which is unnecessary in a clustered index because ALL fields are covered by it.
The ability in SQL Server to designate a Primary Key as nonclustered is an unfortunate historical circumstance, due to a conflation of the meaning "preferred natural or candidate key" (from the Logical Model) with the meaning "lookup key in storage" from the Physical Model. My understanding is that originally SYBASE SQL Server always used a 4-byte RowID, whether into a heap or a clustered index, as the "lookup key in storage" from the Physical Model.
A primary key is the key that uniquely identifies a record and is used in all indexes. This is why you can't have more than one. It is also generally the key that is used in joining to child tables but this is not a requirement. The real purpose of a PK is to make sure that something allows you to uniquely identify a record so that data changes affect the correct record and so that indexes can be created.
However, you can put multiple fields in one primary key (a composite PK). This will make your joins slower (espcially if they are larger string type fields) and your indexes larger but it may remove the need to do joins in some of the child tables, so as far as performance and design, take it on a case by case basis. When you do this, each field itself is not unique, but the combination of them is. If one or more of the fields in a composite key should also be unique, then you need a unique index on it. It is likely though that if one field is unique, this is a better candidate for the PK.
Now at times, you have more than one candidate for the PK. In this case you choose one as the PK or use a surrogate key (I personally prefer surrogate keys for this instance). And (this is critical!) you add unique indexes to each of the candidate keys that were not chosen as the PK. If the data needs to be unique, it needs a unique index whether it is the PK or not. This is a data integrity issue. (Note this is also true anytime you use a surrogate key; people get into trouble with surrogate keys because they forget to create unique indexes on the candidate keys.)
There are occasionally times when you want more than one surrogate key (which are usually the PK if you have them). In this case what you want isn't more PK's, it is more fields with autogenerated keys. Most DBs don't allow this, but there are ways of getting around it. First consider if the second field could be calculated based on the first autogenerated key (Field1 * -1 for instance) or perhaps the need for a second autogenerated key really means you should create a related table. Related tables can be in a one-to-one relationship. You would enforce that by adding the PK from the parent table to the child table and then adding the new autogenerated field to the table and then whatever fields are appropriate for this table. Then choose one of the two keys as the PK and put a unique index on the other (the autogenerated field does not have to be a PK). And make sure to add the FK to the field that is in the parent table. In general if you have no additional fields for the child table, you need to examine why you think you need two autogenerated fields.
Some people use the term "primary key" to mean exactly an integer column that gets its values generated by some automatic mechanism. For example AUTO_INCREMENT in MySQL or IDENTITY in Microsoft SQL Server. Are you using primary key in this sense?
If so, the answer depends on the brand of database you're using. In MySQL, you can't do this, you get an error:
mysql> create table foo (
id int primary key auto_increment,
id2 int auto_increment
);
ERROR 1075 (42000): Incorrect table definition;
there can be only one auto column and it must be defined as a key
In some other brands of database, you are able to define more than one auto-generating column in a table.
Having two primary keys at the same time, is not possible. But (assuming that you have not messed the case up with composite key), may be what you might need is to make one attribute unique.
CREATE t1(
c1 int NOT NULL,
c2 int NOT NULL UNIQUE,
...,
PRIMARY KEY (c1)
);
However note that in relational database a 'super key' is a subset of attributes which uniquely identify a tuple or row in a table. A 'key' is a 'super key' that has an additional property that removing any attribute from the key, makes that key no more a 'super key'(or simply a 'key' is a minimal super key). If there are more keys, all of them are candidate keys. We select one of the candidate keys as a primary key. That's why talking about multiple primary keys for a one relation or table is being a conflict.
Good technical answers were given in better way than I can do.
I am only can add to this topic:
If you want something that not allowed/acceptable it is good reason to take step back.
Understand the core of why it's not acceptable.
Dig more in documentation/journal articles/web and etc.
Analyze/review current design and point major flaws.
Consider and test every step during new design.
Always look forward and try to create adaptive solution.
Hope it will helps someone.
Yes, Its possible in SQL,
but we can't set more than one primary keys in MsAccess.
Then, I don't know about the other databases.
CREATE TABLE CHAPTER (
BOOK_ISBN VARCHAR(50) NOT NULL,
IDX INT NOT NULL,
TITLE VARCHAR(100) NOT NULL,
NUM_OF_PAGES INT,
PRIMARY KEY (BOOK_ISBN, IDX)
);
Related
I'm not asking HOW to do this, but if it's what I SHOULD be doing.
Two employees can be working on the same job. So of course, both FKs, EmployeeID and JobID, can have a MANY relationship in a "Employee_Jobs" table.
Let's take Employee A, Employee B, Job A and Job B. All of the following would be acceptable:
A A
A B
B A
B B
What would NOT be acceptable is a duplicate of a combination of these two PKs... since we cannot have for example, [Employee A working on Job A] twice.
So would it be correct to say that the only way to manage this is to make the combination of the two PKs, EmployeeID and JobID, a Unique, non-clustered index?
I tried to think of how to instead, break this up to more tables but I keep getting back to this same problem.
Yes, not only is it appropriate, but in fact, the combination of these two attributes should be the PRIMARY KEY.
and in any other table where the entity represented by rows in the table has a logical attribute (consisting of the two columns employeeId and JobId), which represents the work done by an employee on a job, (or the contribution of the employee to a job, or the association of an employee to a job in any way), a FK in that table should be a composite Foreign Key consisting of these same two columns.
If you are using a surrogate key on this table to simplify joins and definition of Foreign Keys in other tables, then by all means continue to do so, but keep the two-column natural key in this table, as either a unique index or a Alternate Key. (a Key is a Key - anything that is declared or defined to be unique) so as to ensure data integrity in this table. In fact, to make it clear to users of the schema, when this situation comes up, I generally make the composite Natural Key the PRIMARY KEY, and add/define the surrogate (which is used in Joins and Other table FKs), as an alternate key or unique index. This is pretty much only a semantic distinction, only as they create almost identical functionality. But because data integrity is more important to me than join syntax and Foreign Key structure, To me, the Natural Key is the PRIMARY key,
Yes, In that case you should consider making both those fields as primary key; in specific a composite primary key or compound primary key like below which will make sure uniqueness of combination of both the fields.
primary key (EmployeeID , JobID)
Though as you said a Unique, non-clustered index but marking both the field as primary key will create a UNIQUE Clustered Index on them actually.
Making a primary key in a table in database is fine. Making a Composite Primary is also fine. But why cant I have 2 primary keys in a table? What kind of problems may occur if we have 2 primary keys.
Suppose I have a Students table. I don't want Roll No. and Names of each student to be unique. Then why can't I create 2 primary keys in a table? I don't see any logical problem in it now. But definitely I am missing a serious issue that's the reason it does not exist.
I am new in databases, so don't have much idea. It may also create a technical issue rather. Will be happy if someone can educate me on this.
Thanks.
You can create a UNIQUE constraint for both columns UNIQUE(roll,name).
The PK is unique by definition, cause it is used to identify a row from the others, for example, when a foreign key references that table, it is referencing the PK.
If you need another column to 'act' like a PK, give it the attributes unique and not null.
Well, this is simply by definition. There can not be two "primary" conditions, just like there can not be two "latest" versions.
Every table can contain more than one unique keys, but if you decide to have a primary key, this is just one of these unique keys, the "one" you deem the "most important", which identifies every record uniquely.
If you have a table and come to the conclusion that your primary key does not uniquely identify each record (also meaning that there can't be two records with the same values for the primary key), you have chosen the wrong primary key, as by definition, the fields of the primary key must uniquely define each record.
That, however, does not mean there can be no other combination of fields uniquely identifying the record! This is where a second feature kicks in: referential integrity.
You can "link" tables using their primary key. For example: If you have a Customer table and an Orders table, where the Customers table has a primary key on the customer number and the Orders table has a primary key on the order number and the customer number, that means:
Every customer can be identified uniquely by his customer number
Every order is uniquely identified by the order number and the customer number
You can then link the two tables on the customer number. The DB system then ensures several things, among which is the fact that you can not remove a customer who has orders in your database without first removing the orders. Otherwise, you would have orders without being able to find out the customer data, which would violate your database's referential integrity.
If you had two primary keys, the system would not know on which to ensure referential integrity, so you'd have to tell the system which key to use - which would make one of the primary keys more important, which would make it the "primary key" (!) of the primary keys.
You can have multiple candidate keys in a table but by convention only one key per table is called "primary". That's just a convention though and it doesn't make any real difference to the function of the keys. A primary key is no different to any other candidate key. If you find it convenient to call more than one key "primary" then I suggest you do so. In my opinion (I'm not the only one) the idea of designating a "primary" key at all is essentially an outdated concept of very little importance in database design.
You might be interested to know that early papers on the relational database model (e.g. by E.F.Codd, the relational model's inventor) actually used the term "primary key" to describe all the keys of a relation and not just one. So there is a perfectly good precedent for multiple primary keys per table. The idea of designating exactly one primary key is more recent and probably came into common use through the popularity of ER modelling techniques.
Create an unique index on the 2nd attribute (Names), it's almost the same as primary key with another name.
From Wikipedia (http://en.wikipedia.org/wiki/Unique_key):
A table can have at most one primary key, but more than one unique
key. A primary key is a combination of columns which uniquely specify
a row. It is a special case of unique keys. One difference is that
primary keys have an implicit NOT NULL constraint while unique keys do
not. Thus, the values in unique key columns may or may not be NULL,
and in fact such a column may contain at most one NULL fields.
Another difference is that primary keys must be defined using another
syntax.
Can anyone tell me what is the difference between a primary key and index key. And when to use which?
A primary key is a special kind of index in that:
there can be only one;
it cannot be nullable; and
it must be unique.
You tend to use the primary key as the most natural unique identifier for a row (such as social security number, employee ID and so forth, although there is a school of thought that you should always use an artificial surrogate key for this).
Indexes, on the other hand, can be used for fast retrieval based on other columns. For example, an employee database may have your employee number as the primary key but it may also have an index on your last name or your department.
Both of these indexes (last name and department) would disallow NULLs (probably) and allow duplicates (almost certainly), and they would be useful to speed up queries looking for anyone with (for example) the last name 'Corleone' or working in the 'HitMan' department.
A key (minimal superkey) is a set of attributes, the values of which are unique for every tuple (every row in the table at some point in time).
An index is a performance optimisation feature that enables data to be accessed faster.
Keys are frequently good candidates for indexing and some DBMSs automatically create indexes for keys, but that doesn't have to be so.
The phrase "index key" mixes these two quite different words and might be best avoided if you want to avoid any confusion. "Index key" is sometimes used to mean "the set of attributes in an index". However the set of attributes in question are not necessarily a key because they may not be unique.
Oracle Database enforces a UNIQUE key or PRIMARY KEY integrity constraint on a table by creating a unique index on the unique key or primary key. This index is automatically created by the database when the constraint is enabled.
You can create indexes explicitly (outside of integrity constraints) using the SQL statement CREATE INDEX .
Indexes can be unique or non-unique. Unique indexes guarantee that no two rows of a table have duplicate values in the key column (or columns). Non-unique indexes do not impose this restriction on the column values.
Use the CREATE UNIQUE INDEX statement to create a unique index.
Specifying the Index Associated with a Constraint
If you require more explicit control over the indexes associated with UNIQUE and PRIMARY KEY constraints, the database lets you:
1. Specify an existing index that the database is to use
to enforce the constraint
2. Specify a CREATE INDEX statement that the database is to use to create
the index and enforce the constraint
These options are specified using the USING INDEX clause.
Example:
CREATE TABLE a (
a1 INT PRIMARY KEY USING INDEX (create index ai on a (a1)));
http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
Other responses are defining the Primary Key, but not the Primary Index.
A Primary Index isn't an index on the Primary Key.
A Primary Index is your table's data structure, but only if your data structure is ordered by the Primary Key, thus allowing efficient lookups without a requiring a separate data structure to look up records by the Primary Key.
All databases (that I'm aware of) have a Primary Key.
Not all databases have a Primary Index. Most of those that don't build a secondary index on the Primary Key by default.
Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.
Pros:
- whether you want to or not, you can now uniquely identify every row in your table which previously you could not do
- you can't do sql replication without a primary key on your table
Cons:
- an extra 32 bits for each row of your table
Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.
Are there other reasons besides size that every table shouldn't just have an integer identity field?
Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.
In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).
The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:
CREATE TABLE Authorship (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors (author_id),
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
I also prefer a natural key when I design a tagging system:
CREATE TABLE Tags (
tag VARCHAR(20) PRIMARY KEY
);
CREATE TABLE ArticlesTagged (
article_id INT NOT NULL,
tag VARCHAR(20) NOT NULL,
PRIMARY KEY (article_id, tag),
FOREIGN KEY (article_id) REFERENCES Articles (article_id),
FOREIGN KEY (tag) REFERENCES Tags (tag)
);
This has some advantages over using a surrogate "tag_id" key:
You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
You prevent two distinct tags from having the exact same spelling.
Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.
Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.
Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.
That being said, I typically use autoincrementing 64bit integers for primary keys.
Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)
Your primary key needs to be defined on how you are going to link the table to other tables.
I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.
So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.
For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.
One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.
In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).
To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.
Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value
Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'
In this situation making code a primary key ensures no duplicates, and is more human readable.
The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.
Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.
A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.
So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.
If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.
A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.
I know for sure that some very smart people working for web giants do this. While I don't know why their own reasons, I know 2 cases where PK-less tables make sense:
Importing data. The table is temporary. Insertions and whole table scans need to be as fast as possible. Also, we need to accept duplicate records. Later we will clean the data, but the import process needs to work.
Analytics in a DBMS. Identifying a row is not useful - if we need to do it, it is not analytics. We just need a non-relational, redundant, horrible blob that looks like a table. We will build summary tables or materialized views by writing proper SQL queries.
Note that these cases have good reasons to be non-relational. But normally your tables should be relational, so... yes, they need a primary key.
I'm currently designing a brand new database. In school, we always learned to put a primary key in each table.
I read a lot of articles/discussions/newsgroups posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK.
What's your point of view?
A Primary Key is really just a candidate key that does not allow for NULL. As such, in SQL terms - it's no different than any other unique key.
However, for our non-theoretical RDBMS's, you should have a Primary Key - I've never heard it argued otherwise. If that Primary Key is a surrogate key, then you should also have unique constraints on the natural key(s).
The important bit to walk away with is that you should have unique constraints on all the candidate (whether natural or surrogate) keys. You should then pick the one that is easiest to reference in a Foreign Key to be your Primary Key*.
You should also have a clustered index*. this could be your Primary Key, or a natural key - but it's not required to be either. You should pick your clustered index based on query usage of the table. When in doubt, the Primary Key is not a bad first choice.
Though it's technically only required to refer to a unique key in a foreign key relationship, it's accepted standard practice to greatly favor the primary key. In fact, I wouldn't be surprised if some RDBMS only allow primary key references.
Edit: It's been pointed out that Oracle's term of "clustered table" and "clustered index" are different than Sql Server. The equivalent of what I'm speaking of in Oracle-ese is an Index Ordered Table and it is recommended for OLTP tables - which, I think, would be the main focus of SO questions. I assume if you're responsible for a large OLAP data warehouse, you should already have your own opinions on database design and optimization.
Can you provide references to these articles?
I see no reason to change the tried and true methods. After all, Primary Keys are a fundamental design feature of relational databases.
Using UNIQUE to serve the same purpose sounds really hackish to me. What is their rationale?
Edit: My attention just got drawn back to this old answer. Perhaps the discussion that you read regarding PK vs. UNIQUE dealt with people making something a PK for the sole purpose of enforcing uniqueness on it. The answer to this is, If it IS a key, then make it key, otherwise make it UNIQUE.
A primary key is just a candidate key (unique constraint) singled out for special treatment (automatic creation of indexes, etc).
I expect that the folks who argue against them see no reason to treat one key differently than another. That's where I stand.
[Edit] Apparently I can't comment even on my own answer without 50 points.
#chris: I don't think there's any harm. "Primary Key" is really just syntactic sugar. I use them all the time, but I certainly don't think they're required. A unique key is required, yes, but not necessarily a Primary Key.
It would be very rare denormalization that would make you want to have a table without a primary key. Primary keys have unique constraints automatically just by their nature as the PK.
A unique constraint would be used when you want to guarantee uniqueness in a column in ADDITION to the primary key.
The rule of always have a PK is a good one.
http://msdn.microsoft.com/en-us/library/ms191166.aspx
You should always have a primary key.
However I suspect your question is just worded bit misleading, and you actually mean to ask if the primary key should always be an automatically generated number (also known as surrogate key), or some unique field which is actual meaningful data (also known as natural key), like SSN for people, ISBN for books and so on.
This question is an age old religious war in the DB field.
My take is that natural keys are preferable if they indeed are unique and never change. However, you should be careful, even something seemingly stable like a persons SSN may change under certain circumstances.
Unless the table is a temporary table to stage the data while you work on it, you always want to put a primary key on the table and here's why:
1 - a unique constraint can allow nulls but a primary key never allows nulls. If you run a query with a join on columns with null values you eliminate those rows from the resulting data set because null is not equal to null. This is how even big companies can make accounting errors and have to restate their profits. Their queries didn't show certain rows that should have been included in the total because there were null values in some of the columns of their unique index. Shoulda used a primary key.
2 - a unique index will automatically be placed on the primary key, so you don't have to create one.
3 - most database engines will automatically put a clustered index on the primary key, making queries faster because the rows are stored contiguously in the data blocks. (This can be altered to place the clustered index on a different index if that would speed up the queries.) If a table doesn't have a clustered index, the rows won't be stored contiguously in the data blocks, making the queries slower because the read/write head has to travel all over the disk to pick up the data.
4 - many front end development environments require a primary key in order to update the table or make deletions.
Primary keys should be used in situations where you will be establishing relationships from this table to other tables that will reference this value. However, depending on the nature of the table and the data that you're thinking of applying the unique constraint to, you may be able to use that particular field as a natural primary key rather than having to establish a surrogate key. Of course, surrogate vs natural keys are a whole other discussion. :)
Unique keys can be used if there will be no relationship established between this table and other tables. For example, a table that contains a list of valid email addresses that will be compared against before inserting a new user record or some such. Or unique keys can be used when you have values in a table that has a primary key but must also be absolutely unique. For example, if you have a users table that has a user name. You wouldn't want to use the user name as the primary key, but it must also be unique in order for it to be used for log in purposes.
We need to make a distinction here between logical constructs and physical constructs, and similarly between theory and practice.
To begin with: from a theoretical perspective, if you don't have a primary key, you don't have a table. It's just that simple. So, your question isn't whether your table should have a primary key (of course it should) but how you label it within your RDBMS.
At the physical level, most RDBMSs implement the Primary Key constraint as a Unique Index. If your chosen RDBMS is one of these, there's probably not much practical difference, between designating a column as a Primary Key and simply putting a unique constraint on the column. However: one of these options captures your intent, and the other doesn't. So, the decision is a no-brainer.
Furthermore, some RDBMSs make additional features available if Primary Keys are properly labelled, such as diagramming, and semi-automated foreign-key-constraint support.
Anyone who tells you to use Unique Constraints instead of Primary Keys as a general rule should provide a pretty damned good reason.
the thing is that a primary key can be one or more columns which uniquely identify a single record of a table, where a Unique Constraint is just a constraint on a field which allows only a single instance of any given data element in a table.
PERSONALLY, I use either GUID or auto-incrementing BIGINTS (Identity Insert for SQL SERVER) for unique keys utilized for cross referencing amongst my tables. Then I'll use other data to allow the user to select specific records.
For example, I'll have a list of employees, and have a GUID attached to every record that I use behind the scenes, but when the user selects an employee, they're selecting them based off of the following fields: LastName + FirstName + EmployeeNumber.
My primary key in this scenario is LastName + FirstName + EmployeeNumber while unique key is the associated GUID.
posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK
i guess that the only point here is the same old discussion "natural vs surrogate keys", because unique indexes and pk´s are the same thing.
translating:
posts saying that it's better to use natural key instead of surrogate key
I usually use both PK and UNIQUE KEY. Because even if you don't denote PK in your schema, one is always generated for you internally. It's true both for SQL Server 2005 and MySQL 5.
But I don't use the PK column in my SQLs. It is for management purposes like DELETEing some erroneous rows, finding out gaps between PK values if it's set to AUTO INCREMENT. And, it makes sense to have a PK as numbers, not a set of columns or char arrays.
I've written a lot on this subject: if you read anything of mine be clear that I was probably referring specifically to Jet a.k.a. MS Access.
In Jet, the tables are physically ordered on the PRIMARY KEY using a non-maintained clustered index (is clustered on compact). If the table has no PK but does have candidate keys defined using UNIQUE constraints on NOT NULL columns then the engine will pick one for the clustered index (if your table has no clustered index then it is called a heap, arguably not a table at all!) How does the engine pick a candidate key? Can it pick one which includes nullable columns? I really don't know. The point is that in Jet the only explicit way of specifying the clustered index to the engine is to use PRIMARY KEY. There are of course other uses for the PK in Jet e.g. it will be used as the key if one is omitted from a FOREIGN KEY declaration in SQL DDL but again why not be explicit.
The trouble with Jet is that most people who create tables are unaware of or unconcerned about clustered indexes. In fact, most users (I wager) put an autoincrement Autonumber column on every table and define the PRIMARY KEY solely on this column while failing to put any unique constraints on the natural key and candidate keys (whether an autoincrement column can actually be regarded as a key without exposing it to end users is another discussion in itself). I won't go into detail about clustered indexes here but suffice to say that IMO a sole autoincrement column is rarely to ideal choice.
Whatever you SQL engine, the choice of PRIMARY KEY is arbitrary and engine specific. Usually the engine will apply special meaning to the PK, therefore you should find out what it is and use it to your advantage. I encourage people to use NOT NULL UNIQUE constraints in the hope they will give greater consideration to all candidate keys, especially when they have chosen to use 'autonumber' columns which (should) have no meaning in the data model. But I'd rather folk choose one well considered key and used PRIMARY KEY rather than putting it on the autoincrement column out of habit.
Should all tables have a PK? I say yes because doing otherwise means at the very least you are missing out on a slight advantage the engine affords the PK and at worst you have no data integrity.
BTW Chris OC makes a good point here about temporal tables, which require sequenced primary keys (lowercase) which cannot be implemented via simple PRIMARY KEY constraints (SQL key words in uppercase).
PRIMARY KEY
1. Null
It doesn’t allow Null values. Because of this we refer PRIMARY KEY =
UNIQUE KEY + Not Null CONSTRAINT.
2. INDEX
By default it adds a clustered index.
3. LIMIT
A table can have only one PRIMARY KEY Column[s].
UNIQUE KEY
1. Null
Allows Null value. But only one Null value.
2. INDEX
By default it adds a UNIQUE non-clustered index.
3. LIMIT
A table can have more than one UNIQUE Key Column[s].
If you plan on using LINQ-to-SQL, your tables will require Primary Keys if you plan on performing updates, and they will require a timestamp column if you plan on working in a disconnected environment (such as passing an object through a WCF service application).
If you like .NET, PK's and FK's are your friends.
I submit that you may need both. Primary keys by nature need to be unique and not nullable. They are often surrogate keys as integers create faster joins than character fileds and especially than multiple field character joins. However, as these are often autogenerated, they do not guarantee uniqueness of the data record excluding the id itself. If your table has a natural key that should be unique, you should have a unique index on it to prevent data entry of duplicates. This is a basic data integrity requirement.
Edited to add: It is also a real problem that real world data often does not have a natural key that truly guarantees uniqueness in a normalized table structure, especially if the database is people centered. Names, even name, address and phone number combined (think father and son in the same medical practice) are not necessarily unique.
I was thinking of this problem my self. If you are using unique, you will hurt the 2. NF. According to this every non-pk-attribute has to be depending on the PK. The pair of attributes in this unique constraint are to be considered as part of the PK.
sorry for replying to this 7 years later but didn't want to start a new discussion.