How to store a "primary" record - database

Suppose I have the following tables
Companies
--CompanyID
--CompanyName
and
Locations
--LocationID
--CompanyID
--LocationName
Every company has at least one location. I want to track a primary location for each company (and yes, every company will have exactly one primary location). What's the best way to set this up? Add a primaryLocationID in the Companies table?

Add a primaryLocationID in the Companies table?
Yes, however that creates a circular reference which could prevent you from inserting new data:
One way to resolve this chicken-and-egg problem is to simply leave Company.PrimaryLocationID NULL-able, so you can temporarily disable one of the circular FKs. This unfortunately means the database will enforce only "1:0..1", but not the strict "1:1" relationship (so you'll have to enforce it in the application code).
However, if your DBMS supports deferred constraints (such as Oracle or PostgreSQL), you can simply defer one of the FKs to break the cycle while the transaction is still in progress. By the end of the transaction both FKs have to be in place, resulting in a real "1:1" relationship.
The alternative solution is to have a flag in the Locations table that is set for a primary location, and NULL non-primary locations (note the U1, denoting a UNIQUE constraint, ensuring a company cannot have multiple primary locations):
CREATE TABLE Location (
LocationID INT PRIMARY KEY,
CompanyID INT NOT NULL, -- References Company table, not shown here.
LocationName VARCHAR(50) NOT NULL, -- Possibly UNIQUE?
IsPrimary INT CHECK (IsPrimary IS NULL OR IsPrimary = 1), -- Use a BIT or BOOLEAN if supported by your DBMS.
CONSTRAINT Locations_U1 UNIQUE (CompanyID, IsPrimary)
);
Unfortunately, this has some problems:
It can only guarantee up to "1:0..1" (but not the real "1:1") even on a DBMS that supports deferred constraints.
It requires an additional index (in support to the UNIQUE constraint). Every index brings certain overhead, mostly for INSERT/UPDATE/DELETE performance. Furthermore, secondary indexes in clustered tables contain copy of PK, which may make them "fatter" than expected.
It depends on ANSI-compliant composite UNIQUE constraints, that allow duplicated rows if any (but not necessarily all) of the fields are NULL. Unfortunately not all DBMSes follow the standard, so the above would not work out-of-box under Oracle or MS SQL Server (but would work under PostgreSQL and MySQL). You could use a filtered unique index instead of the UNIQUE constraint to work-around that, but not all DBMSes support that either.
The BaBL86's solution models M:N, while your requirement seems to be 1:N. Nonetheless, that model could be "coerced" into 1:N by placing a key on {LocationID} (and on {CompanyID, TypeOfLocation} to ensure there cannot be multiple locations of the same type for the same company), but is probably over-engineered for a simple "is primary" requirement.

I think your own solution is the best one - this ensures that every company can only have one primary location. By making it a NOT NULL column, you can even enforce that every company should have a primary location.
Using BaBL86's solution, you don't have those constraints: a company can have 0 - unlimited 'primary locations', which obviously shouldn't be possible.
Do note that, if you use foreign key constraints AND define primaryLocationID as a NOT NULL column, you'll run into problems, because you basically have a loop (Location points to Company, Company points to location). You cannot create a new Company (because it needs a primary location), nor can you create a new Location (because it needs a company).

I do it with pivot table:
CompanyLocations
--CompanyID
--LocationID
--TypeOfLocation (primary, office, warehouse etc.)
In this case you can select all locations and than use type as you like. If you create PrimaryLocationID - you're need two joins of one table and more complex logic. It's worst than this.

Related

Why does my database table need a primary key?

In my database I have a list of users with information about them, and I also have a feature which allows a user to add other users to a shortlist. My user information is stored in one table with a primary key of the user id, and I have another table for the shortlist. The shortlist table is designed so that it has two columns and is basically just a list of pairs of names. So to find the shortlist for a particular user you retrieve all names from the second column where the id in the first column is a particular value.
The issue is that according to many sources such as this Should each and every table have a primary key? you should have a primary key in every table of the database.
According to this source http://www.w3schools.com/sql/sql_primarykey.asp - a primary key in one which uniquely identifies an entry in a database. So my question is:
What is wrong with the table in my database? Why does it need a primary key?
How should I give it a primary key? Just create a new auto-incrementing column so that each entry has a unique id? There doesn't seem much point for this. Or would I somehow encapsulate the multiple entries that represent a shortlist into another entity in another table and link that in? I'm really confused.
If the rows are unique, you can have a two-column primary key, although maybe that's database dependent. Here's an example:
CREATE TABLE my_table
(
col_1 int NOT NULL,
col_2 varchar(255) NOT NULL,
CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
)
If you already have the table, the example would be:
ALTER TABLE my_table
ADD CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
Primary keys must identify each record uniquely and as it was mentioned before, primary keys can consist of multiple attributes (1 or more columns). First, I'd recommend making sure each record is really unique in your table. Secondly, as I understand you left the table without primary key and that's disallowed so yes, you will need to set the key for it.
In this particular case, there is no purpose in same pair of user IDs being stored more than once in the shortlist table. After all, that table models a set, and an element is either in the set or isn't. Having an element "twice" in the set makes no sense1. To prevent that, create a composite key, consisting of these two user ID fields.
Whether this composite key will also be primary, or you'll have another key (that would act as surrogate primary key) is another matter, but either way you'll need this composite key.
Please note that under databases that support clustering (aka. index-organized tables), PK is often also a clustering key, which may have significant repercussions on performance.
1 Unlike in mutiset.
A table with duplicate rows is not an adequate representation of a relation. It's a bag of rows, not a set of rows. If you let this happen, you'll eventually find that your counts will be off, your sums will be off, and your averages will be off. In short, you'll get confusing errors out of your data when you go to use it.
Declaring a primary key is a convenient way of preventing duplicate rows from getting into the database, even if one of the application programs makes a mistake. The index you obtain is a side effect.
Foreign key references to a single row in a table could be made by referencing any candidate key. However, it's much more convenient if you declare one of those candidate keys as a primary key, and then make all foreign key references refer to the primary key. It's just careful data management.
The one-to-one correspondence between entities in the real world and corresponding rows in the table for that entity is beyond the realm of the DBMS. It's up to your applications and even your data providers to maintain that correspondence by not inventing new rows for existing entities and not letting some new entities slip through the cracks.
Well since you are asking, it's good practice but in a few instances (no joins needed to the data) it may not be absolutely required. The biggest problem though is you never really know if requirements will change and so you really want one now so you aren't adding one to a 10m record table after the fact.....
In addition to a primary key (which can span multiple columns btw) I think it is good practice to have a secondary candidate key which is a single field. This makes joins easier.
First some theory. You may remember the definition of a function from HS or college algebra is that y = f(x) where f is a function if and only if for every x there is exactly one y. In this case, in relational math we would say that y is functionally dependent on x on this case.
The same is true of your data. Suppose we are storing check numbers, checking account numbers, and amounts. Assuming that we may have several checking accounts and that for each checking account duplicate check numbers are not allowed, then amount is functionally dependent on (account, check_number). In general you want to store data together which is functionally dependent on the same thing, with no transitive dependencies. A primary key will typically be the functional dependency you specify as the primary one. This then identifies the rest of the data in the row (because it is tied to that identifier). Think of this as the natural primary key. Where possible (i.e. not using MySQL) I like to declare the primary key to be the natural one, even if it spans across columns. This gets complicated sometimes where you may have multiple interchangeable candidate keys. For example, consider:
CREATE TABLE country (
id serial not null unique,
name text primary key,
short_name text not null unique
);
This table really could have any column be the primary key. All three are perfectly acceptable candidate keys. Suppose we have a country record (232, 'United States', 'US'). Each of these fields uniquely identifies the record so if we know one we can know the others. Each one could be defined as the primary key.
I also recommend having a second, artificial candidate key which is just a machine identifier used for linking for joins. In the above example country.id does this. This can be useful for linking other records to the country table.
An exception to needing a candidate key might be where duplicate records really are possible. For example, suppose we are tracking invoices. We may have a case where someone is invoiced independently for two items with one showing on each of two line items. These could be identical. In this case you probably want to add an artificial primary key because it allows you to join things to that record later. You might not have a need to do so now but you may in the future!
Create a composite primary key.
To read more about what a composite primary key is, visit
http://www.relationaldbdesign.com/relational-database-analysis/module2/concatenated-primary-keys.php

When having an identity column is not a good idea?

In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.

Many-to-many relationship structure in SQL Server with or without extra primary key column?

Assume that we have two tables: Roles and Reports. And there exists
a many-to-many relationship between them. Of course, the only solution
that comes to my mind is to create a cross-table, let's name it RoleReport.
I can see two approaches to the structure of that table:
1. Columns: RoleReportId, RoleId, ReportId
PK: RoleReportId
2. Columns: RoleId, ReportId
PK: RoleId, ReportId
Is there any real difference between them (performance or whatever else)?
You will need a composite UNIQUE index on (RoleId, ReportId) anyway.
There is no point in not doing it a PRIMARY KEY.
If you do it a CLUSTERED PRIMARY KEY (which is default), this will be better performance-wise, since it will be less in size.
A clustered primary key will contain only two columns in each record: RoleID and ReportID, while a secondary index will contain three columns: RoleID, ReportID and RoleReportID (as a row pointer).
You may want to create an additional index on ReportID which may be used to search all Roles for a given Report.
There would be some point in making a surrogate key for this relationship if the two following conditions held:
You have additional attributes in your relationship (i. e. this table contains additional columns, like Date or anything else)
You have lots of tables that refer to this relationship with a FOREIGN KEY
In this case it would be nicer to have a single-column PRIMARY KEY to refer to in FOREIGN KEY relationships.
Since you don't seem to have this need, just make a composite PRIMARY KEY.
You don't actually need the RoleReportId. It adds nothing to the relationship.
Many people try to avoid using a naturally-unique key in real tables, instead opting for an artificially unique one, but I don't always agree with that. For example, if you can be sure that your SSN will never change, you can use that as a key. If it somehow does change in the future, you can fix it then.
But I don't intend arguing that point, there's good arguments on both sides. However, you certainly don't need an artificially unique key in this case since both your other fields are, and will remain, unique.
Unless you really need the RoleReportId as a foreign key in some other table (which is not usually going to be the case), go with option 2. It's going to require less storage, and that by itself will probably give a performance advantage -- plus why have a column you're never going to use?
Semantically, the difference is what you're using as the primary key.
Typically I let the remainder of my schema dictate what I do in this situation. If the cross-table is exclusively the implementation of the many-to-many relationship, I tend to use the concatenated primary key. If I'm hanging more information off the cross table, making it an entity in its own right, I'm more inclined to give it its own id independent of the two tables it's connecting.
This is, of course, subjective. I don't claim that this is the One True Way (tm).
If you have many rows, then it might be beneficial to have appropriately ordered indexes on your RoleId and/or ReportId columns, since this will speed up look up operations - but inversely this will slow down insert/delete operations. This is a classic usage profile issue...
If not required otherwise, omit the RoleReportId PK. It adds nothing to the relationship, forces the Server to generate a useless number on each insert, and leaves the other two columns unordered, which slows down lookups.
But all in all, we are talking about milliseconds here. This only becomes relevant, if there is a huge amount of data (say more than 10.000 rows)...
I would suggest du choose no PK for your second choice. You may use indices or an unique constraint over the combination of both columns.
The benefit of using RoleReportID as a single-column primary key comes when you (or the other guy, depending on the structure of your company) need to write a front end that addresses individual role<->report relationships (for instance, to delete one). At that point, you may prefer the fact that you need to address only one column, instead of two, to identify the linking record.
Other than that, you don't need the RoleReportID column.

Entity relationship

I have doubt in this design(er_lp). My doubt is in how to create many - to- many relationship with entities with composite keys, Secondly, in using date type as pk. Here each machines work daily for three shifts for different userDepts on one or more fields. So to keep record of working and down hours of machineries I have used shift,taskDay and machinePlate as pks. As you will see from the ER diagram, I ended up with too many pks in the link table in many places. I hesitate not to get in to trouble in coding phase
Is there a better way to do this?
Thank you !!
Dejene
See also extra information posted as a second question Entity Relationship. The material, reformatted, is:
Elaboration: Yes, 'Field ' is referring to areas of land. We have several cane growing fields at different location. It [each field?] is named and has budget.
User is not referring to individual who are working on the machine. They are departments. I used 'isDone' table to link userDept with machine. A machine can be used by several departments and many machines can work for a userDept.
A particular machine can be used for multiple tasks on a given shift. It can work for say 2 hours and can start another task on another field. We have three shifts per day, each of 8 hrs!
If I use Auto increment PK, do you think that other key are important? I don't prefer to use it!
Usually, I use auto increment key alone in a table. How can we create relationship that involves auto increment keys?
Thank you for thoughtful comment!!
You always create many-to-many relationships between two tables using a third table, the rows of which contain the columns for the primary key of each table, and the combination of all columns is the primary key of the third table. The rule doesn't change for tables with composite primary keys.
CREATE TABLE Table1(Col11 ..., Col12 ..., Col1N ...,
PRIMARY KEY(Col11, Col12));
CREATE TABLE Table2(Col21 ..., Col22 ..., Col2N ...,
PRIMARY KEY(Col21, Col22));
CREATE TABLE RelationTable
(
Col11 ...,
Col12 ...,
FOREIGN KEY (Col11, Col12) REFERENCES Table1,
Col21 ...,
Col22 ...,
FOREIGN KEY (Col21, Col22) REFERENCES Table2,
PRIMARY KEY (Col11, Col12, Col21, Col22)
);
This works fine. It does suggest that you should try and keep keys simple whenever possible, but there is absolutely no need to bend over backwards adding auto-increment columns to the referenced tables if they have a natural composite key that is convenient to use. OTOH, the joins involving the relation table are harder to write if you use a composite keys - I'd think several times about what I'm about if either composite key involved more than two columns, not least because it might indicate problems in the design of the referenced tables.
Looking at the actual ER diagram - the 'er_lp' URL in the question - the 'tbl' prefix seems a trifle unnecessary; the things storing data in a database are always tables, so telling me that with the prefix is ... unnecessary. The table called 'Machine' seems to be misnamed; it does not so much describe a machine as the duty allocated to a machine on a particular shift. I'm guessing that the 'Field' table is referring to areas of land, rather than parts of a database. You have the 'IsDone' table (again, not particularly well named) that identifies the user who worked on a machine for a particular shift and hence for a particular task. That involves a link between the Machine table (which has a 3-part primary key) and the User table. It isn't clear whether a particular machine can be used for multiple tasks on a given shift. It isn't clear whether shift numbers cycle around the day or whether each shift number is unique across days, but the presumption must be that there are, say, three shifts per day, and the shift number and date is needed to identify when something occurred. Presumably, the Shift table would identify times and other such information.
The three-part primary key on Machine is fine - but it might be better to have two unique identifiers. One would be the current primary key combination; the other would be an automatically assigned number - auto-increment, serial, sequence or whatever...
Addressing the extended information.
It is not clear to me any more what you are seeking to track. If the 'Machine' table is supposed to track what a given machine was being used for, then you probably need to do some more structuring of the data. Given that a machine can be used for different tasks on different fields during a single shift, you should think, perhaps, in terms of a MachineTasks table which would identify the (date and) time when the operation started and finished and the type of operation. For repair operations, you'd store the information in a table describing repairs; for routine operations in a field, you might not need much extra information. Or maybe that is overkill.
I'm not clear whether particular tasks are performed on behalf of multiple departments, or whether you are simply trying to note that during a single shift a machine might be used by multiple departments, but one department at a time for each task. If each task is for a separate department, then simply include the department info in the main MachineTasks table as a foreign key field.
If you decide on an auto-increment key, you still need to maintain the uniqueness of the composite key. This is the biggest mistake I see people making with auto-increment fields. It isn't quite as simple as "a table with an auto-increment key must also have a second unique constraint on it", but it isn't too far off the mark.
When you use an auto-increment key, you need to retrieve the value assigned when you insert a record into the table; you then use that value in the foreign key columns when you insert other records into the other tables.
You need to read up on database design - I'm not sure what the current good books are as I did most of my learning a decade and more ago, and my books are consequently less likely to be available still.
One good way of not getting into trouble with primary keys is to have a single field for primary key. Usually a numeric (auto incremental) column is just fine. You can still have unique keys with multiple columns.
tblWorksOn
tblMachine
tblIsDone
...seem to be the problem tables.
Its looks like you could use taskDate for the tblMachine table as the primary key. The rest can be foriegn keys.
With the changes to the tblMachine table you can then use the taskDate with the fieldNo for the tblWorksOn table and the taskDate with the userID for the tblIsDone. Use these two fields to create Composite Keys (CK)
e.g.
tblMachine
taskDate (PK)
tblWorksOn
fieldNo (CK)
taskDate (CK)
tblIsDone
userID (CK)
taskDate (CK)

Primary Key versus Unique Constraint?

I'm currently designing a brand new database. In school, we always learned to put a primary key in each table.
I read a lot of articles/discussions/newsgroups posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK.
What's your point of view?
A Primary Key is really just a candidate key that does not allow for NULL. As such, in SQL terms - it's no different than any other unique key.
However, for our non-theoretical RDBMS's, you should have a Primary Key - I've never heard it argued otherwise. If that Primary Key is a surrogate key, then you should also have unique constraints on the natural key(s).
The important bit to walk away with is that you should have unique constraints on all the candidate (whether natural or surrogate) keys. You should then pick the one that is easiest to reference in a Foreign Key to be your Primary Key*.
You should also have a clustered index*. this could be your Primary Key, or a natural key - but it's not required to be either. You should pick your clustered index based on query usage of the table. When in doubt, the Primary Key is not a bad first choice.
Though it's technically only required to refer to a unique key in a foreign key relationship, it's accepted standard practice to greatly favor the primary key. In fact, I wouldn't be surprised if some RDBMS only allow primary key references.
Edit: It's been pointed out that Oracle's term of "clustered table" and "clustered index" are different than Sql Server. The equivalent of what I'm speaking of in Oracle-ese is an Index Ordered Table and it is recommended for OLTP tables - which, I think, would be the main focus of SO questions. I assume if you're responsible for a large OLAP data warehouse, you should already have your own opinions on database design and optimization.
Can you provide references to these articles?
I see no reason to change the tried and true methods. After all, Primary Keys are a fundamental design feature of relational databases.
Using UNIQUE to serve the same purpose sounds really hackish to me. What is their rationale?
Edit: My attention just got drawn back to this old answer. Perhaps the discussion that you read regarding PK vs. UNIQUE dealt with people making something a PK for the sole purpose of enforcing uniqueness on it. The answer to this is, If it IS a key, then make it key, otherwise make it UNIQUE.
A primary key is just a candidate key (unique constraint) singled out for special treatment (automatic creation of indexes, etc).
I expect that the folks who argue against them see no reason to treat one key differently than another. That's where I stand.
[Edit] Apparently I can't comment even on my own answer without 50 points.
#chris: I don't think there's any harm. "Primary Key" is really just syntactic sugar. I use them all the time, but I certainly don't think they're required. A unique key is required, yes, but not necessarily a Primary Key.
It would be very rare denormalization that would make you want to have a table without a primary key. Primary keys have unique constraints automatically just by their nature as the PK.
A unique constraint would be used when you want to guarantee uniqueness in a column in ADDITION to the primary key.
The rule of always have a PK is a good one.
http://msdn.microsoft.com/en-us/library/ms191166.aspx
You should always have a primary key.
However I suspect your question is just worded bit misleading, and you actually mean to ask if the primary key should always be an automatically generated number (also known as surrogate key), or some unique field which is actual meaningful data (also known as natural key), like SSN for people, ISBN for books and so on.
This question is an age old religious war in the DB field.
My take is that natural keys are preferable if they indeed are unique and never change. However, you should be careful, even something seemingly stable like a persons SSN may change under certain circumstances.
Unless the table is a temporary table to stage the data while you work on it, you always want to put a primary key on the table and here's why:
1 - a unique constraint can allow nulls but a primary key never allows nulls. If you run a query with a join on columns with null values you eliminate those rows from the resulting data set because null is not equal to null. This is how even big companies can make accounting errors and have to restate their profits. Their queries didn't show certain rows that should have been included in the total because there were null values in some of the columns of their unique index. Shoulda used a primary key.
2 - a unique index will automatically be placed on the primary key, so you don't have to create one.
3 - most database engines will automatically put a clustered index on the primary key, making queries faster because the rows are stored contiguously in the data blocks. (This can be altered to place the clustered index on a different index if that would speed up the queries.) If a table doesn't have a clustered index, the rows won't be stored contiguously in the data blocks, making the queries slower because the read/write head has to travel all over the disk to pick up the data.
4 - many front end development environments require a primary key in order to update the table or make deletions.
Primary keys should be used in situations where you will be establishing relationships from this table to other tables that will reference this value. However, depending on the nature of the table and the data that you're thinking of applying the unique constraint to, you may be able to use that particular field as a natural primary key rather than having to establish a surrogate key. Of course, surrogate vs natural keys are a whole other discussion. :)
Unique keys can be used if there will be no relationship established between this table and other tables. For example, a table that contains a list of valid email addresses that will be compared against before inserting a new user record or some such. Or unique keys can be used when you have values in a table that has a primary key but must also be absolutely unique. For example, if you have a users table that has a user name. You wouldn't want to use the user name as the primary key, but it must also be unique in order for it to be used for log in purposes.
We need to make a distinction here between logical constructs and physical constructs, and similarly between theory and practice.
To begin with: from a theoretical perspective, if you don't have a primary key, you don't have a table. It's just that simple. So, your question isn't whether your table should have a primary key (of course it should) but how you label it within your RDBMS.
At the physical level, most RDBMSs implement the Primary Key constraint as a Unique Index. If your chosen RDBMS is one of these, there's probably not much practical difference, between designating a column as a Primary Key and simply putting a unique constraint on the column. However: one of these options captures your intent, and the other doesn't. So, the decision is a no-brainer.
Furthermore, some RDBMSs make additional features available if Primary Keys are properly labelled, such as diagramming, and semi-automated foreign-key-constraint support.
Anyone who tells you to use Unique Constraints instead of Primary Keys as a general rule should provide a pretty damned good reason.
the thing is that a primary key can be one or more columns which uniquely identify a single record of a table, where a Unique Constraint is just a constraint on a field which allows only a single instance of any given data element in a table.
PERSONALLY, I use either GUID or auto-incrementing BIGINTS (Identity Insert for SQL SERVER) for unique keys utilized for cross referencing amongst my tables. Then I'll use other data to allow the user to select specific records.
For example, I'll have a list of employees, and have a GUID attached to every record that I use behind the scenes, but when the user selects an employee, they're selecting them based off of the following fields: LastName + FirstName + EmployeeNumber.
My primary key in this scenario is LastName + FirstName + EmployeeNumber while unique key is the associated GUID.
posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK
i guess that the only point here is the same old discussion "natural vs surrogate keys", because unique indexes and pk´s are the same thing.
translating:
posts saying that it's better to use natural key instead of surrogate key
I usually use both PK and UNIQUE KEY. Because even if you don't denote PK in your schema, one is always generated for you internally. It's true both for SQL Server 2005 and MySQL 5.
But I don't use the PK column in my SQLs. It is for management purposes like DELETEing some erroneous rows, finding out gaps between PK values if it's set to AUTO INCREMENT. And, it makes sense to have a PK as numbers, not a set of columns or char arrays.
I've written a lot on this subject: if you read anything of mine be clear that I was probably referring specifically to Jet a.k.a. MS Access.
In Jet, the tables are physically ordered on the PRIMARY KEY using a non-maintained clustered index (is clustered on compact). If the table has no PK but does have candidate keys defined using UNIQUE constraints on NOT NULL columns then the engine will pick one for the clustered index (if your table has no clustered index then it is called a heap, arguably not a table at all!) How does the engine pick a candidate key? Can it pick one which includes nullable columns? I really don't know. The point is that in Jet the only explicit way of specifying the clustered index to the engine is to use PRIMARY KEY. There are of course other uses for the PK in Jet e.g. it will be used as the key if one is omitted from a FOREIGN KEY declaration in SQL DDL but again why not be explicit.
The trouble with Jet is that most people who create tables are unaware of or unconcerned about clustered indexes. In fact, most users (I wager) put an autoincrement Autonumber column on every table and define the PRIMARY KEY solely on this column while failing to put any unique constraints on the natural key and candidate keys (whether an autoincrement column can actually be regarded as a key without exposing it to end users is another discussion in itself). I won't go into detail about clustered indexes here but suffice to say that IMO a sole autoincrement column is rarely to ideal choice.
Whatever you SQL engine, the choice of PRIMARY KEY is arbitrary and engine specific. Usually the engine will apply special meaning to the PK, therefore you should find out what it is and use it to your advantage. I encourage people to use NOT NULL UNIQUE constraints in the hope they will give greater consideration to all candidate keys, especially when they have chosen to use 'autonumber' columns which (should) have no meaning in the data model. But I'd rather folk choose one well considered key and used PRIMARY KEY rather than putting it on the autoincrement column out of habit.
Should all tables have a PK? I say yes because doing otherwise means at the very least you are missing out on a slight advantage the engine affords the PK and at worst you have no data integrity.
BTW Chris OC makes a good point here about temporal tables, which require sequenced primary keys (lowercase) which cannot be implemented via simple PRIMARY KEY constraints (SQL key words in uppercase).
PRIMARY KEY
1. Null
It doesn’t allow Null values. Because of this we refer PRIMARY KEY =
UNIQUE KEY + Not Null CONSTRAINT.
2. INDEX
By default it adds a clustered index.
3. LIMIT
A table can have only one PRIMARY KEY Column[s].
UNIQUE KEY
1. Null
Allows Null value. But only one Null value.
2. INDEX
By default it adds a UNIQUE non-clustered index.
3. LIMIT
A table can have more than one UNIQUE Key Column[s].
If you plan on using LINQ-to-SQL, your tables will require Primary Keys if you plan on performing updates, and they will require a timestamp column if you plan on working in a disconnected environment (such as passing an object through a WCF service application).
If you like .NET, PK's and FK's are your friends.
I submit that you may need both. Primary keys by nature need to be unique and not nullable. They are often surrogate keys as integers create faster joins than character fileds and especially than multiple field character joins. However, as these are often autogenerated, they do not guarantee uniqueness of the data record excluding the id itself. If your table has a natural key that should be unique, you should have a unique index on it to prevent data entry of duplicates. This is a basic data integrity requirement.
Edited to add: It is also a real problem that real world data often does not have a natural key that truly guarantees uniqueness in a normalized table structure, especially if the database is people centered. Names, even name, address and phone number combined (think father and son in the same medical practice) are not necessarily unique.
I was thinking of this problem my self. If you are using unique, you will hurt the 2. NF. According to this every non-pk-attribute has to be depending on the PK. The pair of attributes in this unique constraint are to be considered as part of the PK.
sorry for replying to this 7 years later but didn't want to start a new discussion.

Resources