For my country table, I used the
country code as the primary key "AU,
US, UK, FR" etc
For my currency table, I used the currency code as the primary key "AUD, GBP, USD" etc
I think what I did is ok, but another developer wants me to change all the primary keys to an int, because the country code, currency code might change sometime in the future he said. We just don't know that, well in this case he is right, his path is the safest path to take.
Should I change the primary keys to an int to be safe rather than be sorry? Can't I just keep it?
I would use the ISO codes with char columns.
If a country ever splits then you'd get new ISO codes (say SC, WL, EN) but UK will still be valid for historic data.
It's the same for currency. A transaction in 2000 would be in the currency at that time: French Francs, Deutschmarks, Belgium Banana but not Euro.
I would say the "birth of a nation" or the disappearance of a currency is - over all - a rather rare occurence - not likely to happen several times a year, every year.
So in this regard, I would think using the ISO defined country and currency codes for your primary key should be OK.
Yes, if something happens to the Euro zone, or if another country is split into two, you might have to do some manual housekeeping - but you'd have to do this with an INT as well. In a case like this, I would argue that an artificial surrogate key (like such an INT) really only adds overhead and doesn't really help keep things easier/more explicit.
Since those codes are really short, and typically all the same length, I would however recommend using a CHAR(3) or CHAR(5) - no point in using VARCHAR for such a short string, and also, VARCHAR as variable length fields do behave quite differently (and not "better" in terms of performance) that fixed-length fields like INT or CHAR
From a logical point of view, adding a surrogate means extra columns, additional key constraints and more complex logic to query and manipulate the data. That's one thing to consider.
From a physical standpoint, in SQL Server an INTEGER key will take up more than twice as much space as a CHAR(2) or CHAR(3). That means your referencing tables and indexes get larger. It also makes any updates to those foreign key values much more expensive. I don't know your data but it seems quite possible that the referencing data in those foreign key columns could be updated much more frequently than the country code and currency code values in the parent table. In contrast, the ISO codes for currency and country almost never change so that is probably very little to worry about. By changing to INTEGER keys you could very well increase the cost of updating those foreign key values.
If you are considering such a change as a performance optimisation then I suggest you evaluate very carefully whether INTEGER keys will make updates of those values more costly or less costly. I suggest you ignore people who say "always do X". Dogma is no help in database design. Evaluate what the real impact will be in practice and make your decision accordingly.
I think that's your system will become obsolete ten time before the ISO standard about country and currency code will do.
So I really don't see any benefit for using 01010101 01010011 or 21843 instead of "US".
So long as any foreign keys that reference these primary keys are declared with ON UPDATE CASCADE, who cares if these codes change?
There's an additional benefit in querying any of the referencing tables - if all you need is the country/currency code, then there's no need to join to these tables - you've already got the code included in these tables.
And if you do decide to move to a INT surrogate, please remember to place a unique constraint on these columns still - they are a real key for these tables.
I would use INT ids as a key instead of ISO codes and explain you why:
Organization I worked for, uses "own currency" (LBP) - eg, when a user performs some transaction, he receives some amount of LBP as a bonus. Further, he can exchange those LBPs to USD, EUR, etc and vice versa, pay for services with LBPs, etc. Also, I didn't find BTC (Bitcoin) currency in ISO standard.
Yes, these are not official currencies, but it is more flexible from system and users point of view to have them as currencies but not as an additional product which user can buy and sell.
Organization I worked for do not use INTS as primary key, they use ISO codes as ids (plus those additional currencies).
Officially, LBP is ISO standard for Lebanese Pound - so they will not able to add Lebanese Pound to the system smoothly.
If you identify your currencies by Code, and in the future some new currency will be registered as ISO standard (say, LBE, or BTC) - then these currencies will conflict with "your" currencies.
Somebody mentioned here that having additional int key for currencies is an additional index.
But excuse me, is it a problem for 300 records (approximate count of currencies)? Moreover, if you use INTs as primary key for currencies, it has an additional benefit: Imagine a table with 1M transactions which holds amounts and currencies, and what is more efficient: INTS or CHARS?
So I would go for INTs.
Yes, changing to an integer key would be a good idea before it's too late.
E.g. what if Great Britain joins the Euro-zone?
It is a poor practice to use something as primary key that changes. Suppose the value changed, and then you had to update all child records. Doing so could lock up your database for hours or even days. This is why the integer FK with the unique index on the natural key is a better practice for information that is volatile.
Related
What is your choice for primary key in tables that represent a person (like Client, User, Customer, Employee etc.)? My first choice would be an social security number (SSN). However, using SSN has been discouraged because of privacy concerns and different regulations. SSN can change during person lifetime, so that is another reason against it.
I guess that one of the functions of well chosen natural primary key is to avoid duplication. I do not want a person to be registered twice in the database. Some surrogate or generated primary key does not help in avoiding duplicate entries. What is the best way to approach this?
What is the best way to guarantee uniqueness in your application for person entity and can this be handled on database level with primary key or uniqueness constraint?
I don't know which Database engine you are using, but (at least with MySQL -- see 7.4.1. Make Your Data as Small as Possible), using an integer, the shortest possible, is generally considered best for performances and memory requirements.
I would use an integer, auto_increment, for that primary key.
The idea being :
If the PK is short, it helps identifying each row (it's faster and easier to compare two integers than two long strings)
If a column used in foreign keys is short, it'll require less memory for foreign keys, as the value of that column is likely to be stored in several places.
And, then, set a UNIQUE index on an other column -- the one that determines unicity -- if that's possible and/or necessary.
Edit: Here are a couple of other questions/answers that might interest you :
What’s the best practice for Primary Keys in tables?
How do you like your primary keys?
Should I have a dedicated primary key field?
Use item specific prefixes and autonumber for primary keys?
As mentioned above, use an auto-increment as your primary key. But I don't believe this is your real question.
Your real question is how to avoid duplicate entries. In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other. (One might be a foreigner visiting the country).
However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication. Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial. It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).
Of course, if you can get the SSN/SIN then use that to determine uniqueness.
What attributes are available to you? Which ones does your application care about ? For example no two people can be born at exactly the same second at exactly the same place, but you probably don't have access to that data at that level of accuracy! So you need to decide, from the attributes you intend on modeling, which ones are sufficient to provide an acceptable level of data integrity. Whatever you choose, you're right in focusing on the data integrity aspects (preventing insertion of multiple rows for the same person) of your selection.
For Joins/Foreign Keys in other tables, it is best to use a surrogate key.
I've grown to consider the use of the word Primary Key as a misnomer, or at best, confusing. Any key, whether you flag it as Primary Key, Alternate Key, Unique Key, or Unique Index, is still a Key, and requires that every row in the table contain unique values for the attributes in the key. In that sense, all keys are equivilent. What matters more (Most), is whether they are natural keys (dependant on meaningful real- domain model data attributes), or surrogates (Independendant of real data attributes)
Secondly, what also matters is what you use the key for.. Surrogate keys are narrow and simple and never change (No reason to - they don't mean anything) So they are a better choice for joins or for foreign Keys in other dependant tables.
But to ensure data integrity, and prevent insertion of multiple rows for the same domain entity, they are totally useless... For that you need some kind of Natural Key, chosen from the data you have available, and which your application is modeling for some purpose.
The key does not have to be 100% immutable. If (as an example), you use Name and Phone Number and Birthdate, for example, even if a person changes their name, or their phone number, you can simply change the value in the table. As long as no other row already has the new values in their key attributes, you are fine.
Even if the key you select only works in 99.9% of the cases, (say you are unlucky enough to run into two people with the same name and phone number and were coincidentally born the same day), well, at least 99.9% of your data will be guaranteed to be accurate and consistent - and you can for example, just add time to their birthdate to make them unique, or add some other attribute to the key to distinquish them. As long as you don't have to update data values in Foreign Keys throughout your database because of the change, (since you are not using this key as a FK elsewhere) you are not facing any significant issue.
Use an autogenerated integer primary key, and then put a unique constraint on anything that you believe should be unique. But SSNs are not unique in the real world so it would be a bad idea to put a uniqueness constraint on this column unless you think turning away customers because your database won't accept them is a good business model.
I prefer natural keys, but a table person is a lost case. SSNs are not unique and not everybody has one.
I'd recommend a surrogate key. Add all the indexes you need for other candidate keys, but keeping business logic out of the key is my recommendation.
I prefer natural keys, when they can be trusted.
Unless you are running a bank or something like that, there is no reason for your clients and users to provide you with a valid SSN, or even necessarily to have one. Thus, for business reasons, you are forced to distrust SSN in the case you outline. A similar argumant would hold for any given natural key to "persons".
You have no choice but to assign an artificial (Read "surrogate") key. It might as well be an integer. Make sure it's big enough integer so you aren't going to need toexpand it real soon.
To add to #Mark and #Pascal (autoincrement integers are your best bet) -- SSN's are usefull and should be modelled correctly. Security concerns are part of application logic. You can normalize them into a separate table, and you can make them unique by providing a date-issued field.
p.s., to those who disagree with the `security in application' point, an enterprise DB will have a granular ACL model; so this won't be a sticking point.
Classic database table design would include an tableId int index(1,1) not null which results in an auto-increment int32 id field.
However, it could be useful to give these numbers some meaning, and I wanted to know what people thought about using a Char(4) field for an table containing enumerables.
Specifically I was thinking of a User Role table which had data like;
"admn" - "Administrator"
"edit" - "Editor".
I could then reference these 'codes' in my code.
Update
It makes more sense when writing code to see User.IsInRole("admin") rather than User.IsInRole(UserRoles.Admin) where Admin is an int that needs to be updated/synchronised if you ever rebuild your database.
An id field (not associated with the data) is called a surrogate key. They have their advantages and disadvantages. You can see a list of those on this Wikipedia article. Personally I feel that people overuse them and have forgotten (or have never learned) how to properly normalise a database structure.
I always tend to use a surrogate primary key in my tables.
That is, a key that has no meaning in the business domain. A primary key is just an administrative piece of data that is required by the database ...
What would be the advantage of using 'admn' as primary key in this case ?
No. No, no, no, no, and no.
Keys are not data. Keys do not have meaning. That way when meaning changes, keys do not change.
Keys do not have encoded meaning. Encoded meaning is not even potentially possibly maybe useful unless you have an algorithm for decoding it.
Since there's no way to get from "admn" to "Aministrator" without a lookup, and since the real meaning, "Administrator" sits right next to the SEKRET ENKODED "useful" key, why would I ever look at the key instead of the real data right next to it in the table?
If you want an abbreviated form, an enum-like name, or what have you, then call it that and realize it's data. That's perfectly acceptable. create table( id int not null primary key, abbv char(4), name varchar(64));
But it's not a key, it doesn't hash like a integer key, it takes up four character compares and a look for the null terminator to compare it to "edtr", as opposed to one subtraction to compare two integers. There's no decent way to generate a next key: what's next in the sequence ('admn', 'edtr', ?)?
So you've lost generate-ability, easy comparison, possibly size (if you could have used, day, a tinyint as your key), and all for an arbitrary value that's of no real use.
Use a synthetic key. If you really need an abbreviation, make that an attribute column.
A primary key is better if it's never going to change. You can change a primary key as long as you update all references to it but it's a bit of a pain.
Sometimes there's no natural non-changing column in a table and, in that case, a surrogate is useful.
But, if you have a natural non-changing value (like an employee ID that's never recycled or a set of roles that you never expect to change), it's better to use that.
Otherwise, you're introducing complexity to cater for something with a minuscule chance of happening.
That's only my opinion, my name isn't Codd or Date, so don't take this as gospel.
I think the answer is in your post. Your example of User.IsInRole("admin") will always return false as you have a primary key of char(4) and used "admn" as key for the administrator.
I would go for a surrogate Primary key which will never ever change and have the option for a 'functional primary key' to query certain roles which are used hardcoded in the code.
A key should preferrably not have any special meaning in itself. Conditions tend to change, so you may have to change a key if it's meaning changes, and changing keys is not something that you want to do.
Also, when the amount of information that you can put in the key is so limited, it's not much point of having it there.
You are comparing auto-increment vs. the fixed char key, but in that scenario you don't want an auto-incremented id.
There are different routes to go. One is to use an enum, that maps to int ids. These are not auto incremented and are the primary key of the table.
Hard code => database references are always a bad idea! (Except you set them before application start etc.)
Beside this: Should mappings from admn=>administrator really be done in the database?
And you can only store 23^4-(keywords) entries with a varchar4 with a LATIN 23char alphabet.
If you use a non-related number as primary key it's called Surrogate key and a class of person believe it's the best practice.
http://en.wikipedia.org/wiki/Surrogate_key
If you use "admn" as primary key then it's called Natural key and a different class of developers believe it's the best practice.
https://en.wikipedia.org/wiki/Natural_key
Both of them are never going to agree. It's a age old religious war (Surrogate Key vs Natural key). Therefore use what you are most comfortable with. Just know that there are a many that supports your view for everyone that disagrees --- no matter which one you chose.
We are trying to come up with a numbering system for the asset system that we are creating, there has been a few heated discussions on this topic in the office so I decided to ask the experts of SO.
Considering the database design below what would be the better option.
Example 1: Using auto surrogate keys.
================= ==================
Road_Number(PK) Segment_Number(PK)
================= ==================
1 1
Example 2: Using program generated PK
================= ==================
Road_Number(PK) Segment_Number(PK)
================= ==================
"RD00000001WCK" "00000001.1"
(the 00000001.1 means it's the first segment of the road. This increases everytime you add a new segment e.g. 00000001.2)
Example 3: Using a bit of both(adding a new column)
======================= ==========================
ID(PK) Road_Number(UK) ID(PK) Segment_Number(UK)
======================= ==========================
1 "RD00000001WCK" 1 "00000001.1"
Just a bit of background information, we will be using the Road Number and Segment Number in reports and other documents, so they have to be unique.
I have always liked keeping things simple so I prefer example 1, but I have been reading that you should not expose your primary keys in reports/documents. So now I'm thinking more along the lines of example 3.
I am also leaning towards example 3 because if we decide to change how our asset numbering is generated it won't have to do cascade updates on a primary key.
What do you think we should do?
Thanks.
EDIT: Thanks everyone for the great answers, has help me a lot.
This is really a discussion about surrogate (also called technical or synthetic) vs natural primary keys, a subject that has been extensively covered. I covered this in Database Development Mistakes Made by AppDevelopers.
Natural keys are keys based on
externally meaningful data that is
(ostensibly) unique. Common examples
are product codes, two-letter state
codes (US), social security numbers
and so on. Surrogate or technical
primary keys are those that have
absolutely no meaning outside the
system. They are invented purely for
identifying the entity and are
typically auto-incrementing fields
(SQL Server, MySQL, others) or
sequences (most notably Oracle).
In my opinion you should always
use surrogate keys. This issue has
come up in these questions:
How do you like your primary keys?
What’s the best practice for Primary Keys in tables?
Which format of primary key would you use in this situation.
Surrogate Vs. Natural/Business Keys
Should I have a dedicated primary key field?
Auto number fields are the way to go. If your keys have meaning outside your database (like asset numbers) those will quite possibly change and changing keys is problematic. Just use indexes for those things into the relevant tables.
I would personally say keep it simple and stay with an autoincremented primary key. If you need something more "Readable" in terms of display in the program, then possibly one of your other ideas, but I think that is just adding unneeded complexity to the primary key field.
I'm also very strongly in the "don't use primary keys as meaningful data" camp. Every time I have contravened that policy it has ended in tears. Sooner or later the meaningful data needs to change and if that means you have to change a primary key it can get painful. The primary key will probably be used in foreign key constraints and you can spend ages trying to sort it all out just to make a simple data change.
I always use GUIDs/UUIDs for my primary keys in every table I ever create but that's just personal preference serials or such are also good.
Don't put meaning into your PK fields unless...
It is 100% completely impossible that
the value will never change and that
No two people would ever reasonably
argue about which value should be
used for a particular row.
Go with option one and format the value in the app to look like option two or three when it is displayed.
I think the important thing to remember here is that each table in your database/design might have multiple keys. These are the Candidate Keys.
See wikipedia entry for Candidate Keys
By definition, all Candidate Keys are created equal. They are each unique identifiers for the table in question.
Your job then is to select the best candidate from the pool of Candidate Keys to serve as the Primary Key. The Primary Key will be used by other tables to establish the relational constraints, but you are free to continue using Candidate Keys to query the table.
Because Primary Keys are referenced by other structures, and therefore used in join operations, the criteria for Primary Key selection boils down to the following for me (in order of importance):
Immutable/Stable - Primary Key values should not change. If they do, you run the risk of introducing update anomolies
Not Null - most DBMS platforms require that the Primary Key attribute(s) are not null
Simple - simple datatypes and values for physical storage and performance. Integer values work well here, and this is the datatype of choice for most surrogate/auto-gen keys
Once you've identified the Candidate Keys, the criteria above can be used to select the Primary Key. If there is not a "Natural" Candidate Key meets the criteria, then a Surrogate Key that does meet the criteria can be created and used as mentioned in other answers.
Follow the Don't Use policy.
Some problems you can run into:
You need to generate keys from more than one host.
Someone will want to reserve contiguous numbers to use together.
How meaningful will people want it to be? Wars are fought over this, and you're in the first skirmish of one already. "It's already meaningful, and if we just add two more digits we can ..." i.e. you're establishing a design style that will (should) be extensible.
If you are concatenating the two, you're doing typecasts which can mess up your query Optimizer.
You'll need to reclassify roads, and redefine their boundaries (i.e. move the roads), which implies changing the primary key and maybe losing links.
There are workarounds for all this, but this is the kind of issue where workarounds proliferate and get out of control. And it doesn't take more than a couple to get beyond "Simple".
As mentioned before, keep your internal primary keys as just keys, whatever the most optimal datatype is on your platform.
However you do need to let the numbering system argument be fought out, as this is actually a business requirement, and perhaps let's call it an identification system for the asset.
If there is only going to be one identifier, then add it as a column to the main table. If there are likely to be many identification systems (and assets usually have many), you'll need two more tables
Identifier-type table Identifier-cross-ref table
type-id ------------> type-id (unique
type-name identifier-string key)
internal-id
That way different people who need to access the asset can identify in their own way. For example the server team will identify a server differently from the network team and different again from project management, accounts, etc.
Plus, you get to go to all the meetings where everyone argues with each other.
Another thing to keep in mind is that if you're importing alot of data into this system, you may find out that things like Road_Number are not as unique as you thought, and there may be operational roadblocks to fixing the problem (repainting road signs, etc.) .
While natural keys may have great meaning to the business users, if you do not have the agreement that those keys are sacred and should not be altered, you will more than likely be pulling your hair out while maintaining a database where the "product codes have to be changed to accommodate the new product line the company acquired." You need to protect the RI of your data, and integers as primary keys with auto-increment are the best way to go. Performance is also better when indexing and traversing integers than char columns.
While not appropriate as primary keys, natural keys are very appropriate for user consumption and you can enforce uniques via an index. They bring a context to the data that will make it easier for all parties to understand. Also, in the advent that you need to reload data, the natural keys can help verify that your lookups are still valid.
I would go with the surrogate key, but you may want to have a computed column that "formats" the surrogate key into a more "readable" value if that improves your reporting. The computed colum could produce example 2 from the surrogate key for instance for display purposes.
I think the surrogate key route is the way to go and the only exceptions that I make for it are join tables, where the primary key could be composed of the foreign key references. Even in these cases I'm finding that having a surrogate primary key is more useful than not.
I suspect that you really should use option #3, as many here have already said. Surrogate PKs (either Integers or GUIDs) are good practice, even if there are adequate business keys. Surrogates will reduce maintenance headaches (as you yourself have already noted).
That being said, something you may want to consider is whether or not your database is:
focused on data maintenance and transactional processing (i.e. Create/Update/Delete operations)
geared towards analysis and reporting (i.e. Queries)
In other words, are the users concerned with maintaining active data or querying largely static data to find answers?
If you are heavily focused on building an analysis and reporting DB (e.g. a data warehouse/mart) that is exposed to technical business users (e.g. report designers) who have a good grasp of the business vocabulary, then you might want to consider using natural keys based on meaningful business values. They help reduce query complexity by eliminating the need for complex joins and help the user focus on their task, not fighting the database structure.
Otherwise you're probably focused on a full CRUD DB that has to cover all the bases to some degree - this is the vast majority of situations. In which case, go with your option #3. You can always optimize for queryability in the future but you'll be hard pressed to retrofit for maintainability.
I hope you will agree with me that every design element should have single purpose.
Question is what do you think is purpose of PK? If it is to identify unique record in a table, then surrogate keys wins without much trouble. This is simple and straight.
As far as new columns in option 3 are concerned, you should check if these can be calculated (best would be to do calculation in model layer so that they can be changed easily than if calculation done in RDBMS) without too much of performance penalty from other elements. For example, you can store segment number and road number in corresponding tables and then use them to generate "00000001.1". This will allow to change asset numbering on-the-fly.
First off, option 2 is the absolute worst option. As an Index, it's a string, and that makes it slow. And it's generated based on business rules - which can change and cause a rather large headache.
Personally, I always use a separate primary key column; and I always use a GUID. Some developers prefer a simple INT over a GUID for reasons of hard-drive space. However, if the situation arises where you need to merge two databases, GUIDs will almost never collide (whereas INTs are guaranteed to collide).
Primary Keys should NEVER be seen by the user. Making it readable to the user should not be a concern. Primary Keys SHOULD be used to link with Foreign Keys. This is their purpose. The value should be machine readable and, once created, never changed.
Is there a performance gain or best practice when it comes to using unique, numeric ID fields in a database table compared to using character-based ones?
For instance, if I had two tables:
athlete
id ... 17, name ... Rickey Henderson, teamid ... 28
team
teamid ... 28, teamname ... Oakland
The athlete table, with thousands of players, would be easier to read if the teamid was, say, "OAK" or "SD" instead of "28" or "31". Let's take for granted the teamid values would remain unique and consistent in character form.
I know you CAN use characters, but is it a bad idea for indexing, filtering, etc for any reason?
Please ignore the normalization argument as these tables are more complicated than the example.
I find primary keys that are meaningless numbers cause less headaches in the long run.
Text is fine, for all the reasons you mentioned.
If the string is only a few characters, then it will be nearly as small an an integer anyway. The biggest potential drawback to using strings is the size: database performance is related to how many disk accesses are needed. Making the index twice as big, for example, could create disk-cache pressure, and increase the number of disk seeks.
I'd stay away from using text as your key - what happens in the future when you want to change the team ID for some team? You'd have to cascade that key change all through your data, when it's the exact thing a primary key can avoid. Also, though I don't have any emperical evidence, I'd think the INT key would be significantly faster than the text one.
Perhaps you can create views for your data that make it easier to consume, while still using a numeric primary key.
I'm just going to roll with your example. Doug is correct when he says that text is fine. Even for a medium sized (~50gig) database having a 3 letter code be a primary key won't kill the database. If it makes development easier, reduces joins on the other table and it's a field that users would be typing in...I say go for it. Don't do it if it's just an abbreviation that you show on a page or because it makes the athletes table look pretty. I think the key is the question "Is this a code that the user will type in and not just pick from a list?"
Let me give you an example of when I used a text column for a key. I was making software for processing medical claims. After the claim got all digitized a human had to look at the claim and then pick a code for it that designated what kind of claim it was. There were hundreds of codes...and these guys had them all memorized or crib sheets to help them. They'd been using these same codes for years. Using a 3 letter key let them just fly through the claims processing.
I recommend using ints or bigints for primary keys. Benefits include:
This allows for faster joins.
Having no semantic meaning in your primary key allows you to change the fields with semantic meaning without affecting relationships to other tables.
You can always have another column to hold team_code or something for "OAK" and "SD". Also
The standard answer is to use numbers because they are faster to index; no need to compute a hash or whatever.
If you use a meaningful value as a primary key you'll have to update it all through you're database if the team name changes.
To satisfy the above, but still make the database directly readable,
use a number field as the primary key
immediately create a view Athlete_And_Team that joins the Athlete and Team tables
Then you can use the view when you're going through the data by hand.
Are you talking about your primary key or your clustered index? Your clustered index should be the column which you will use to uniquely identify that row by most often. It also defines the logical ordering of the rows in your table. The clustered index will almost always be your primary key, but there are circumstances where they can be differant.
What are the pros/cons for including a date field as a part of a primary key?
Consider a table of parts inventory -- if you want to store the inventory level at the end of each day then a composite primary key on part_id and date_of_day would be fine. You might choose to make that a unique key and add a synthetic primary key, particularly if you have one or more tables referencing it with a foreign key constraint, but that aside, no problem.
So there's nothing necessarily wrong with it, but like any other method it can be used incorrectly as in Patrick's example.
Edit: Here's another comment to add.
I'm reminded of something I wrote a while ago on the subject of whether date values in databases really were natural or synthetic. The readable representation of a date as "YYYY-MM-DD" is certainly natural, but internally in Oracle this is stored as a numeric that just represents that particular date/time to Oracle. We can choose and change the representation of that internal value at any time (to different readable formats, or to a different calendar system entirely) without the internal value losing its meaning as that particular date and time. I think on that basis, a DATE data type lies somewhere between natural and synthetic.
I am ok with it being part of the key, but would add that you should also have an auto-incrementing sequence number be a part of the PK, and enforce that any date is written to the database as UTC, with the downstream systems than converting to local time.
A system that I worked in decided that it would be a grand idea to have an Oracle trigger write to a database whenever another table was touched, and make the sysdate be part of the primary key with no sequence number. Only problem is that if you run an update query that hits the row more than once per second, it breaks the primary key on the table that is recording the change.
if you have already decided to use a 'natural' primary key, then question is: is the date a necessary part of the primary key, or not - pros/cons are irrelevant!
There are some questions I'd ask about using a date as part of the primary key.
Does the date include the time portion? This makes things tricky because time includes time zones and daylight savings. This doesn't alter the date/time value, but may produce unexpected results in terms of sorting or retrieving values based upon a query.
I'm a big believer in the use of surrogate keys (i.e. use a sequence column as the primary key) rather than natural keys (like using a date).
A slight con would be that it's not as elegant a handle as some other identifiers
(e.g. saying to a colleague please can you look at record 475663 is a bit easier than saying please can you look at 2008-12-04 19:34:02)
There is also the risk of confusion over different date format in different locales
(e.g. 4th March 2008 - 4/3/2008 in Europe, 3/4/2008 in USA)
(My preference is always to use a seperate key column)
As always.. It depends.
What is your objective of including a date/time column in a PK? Is it to provide additional information about a record without having to actually select the row?
The main problem I can foresee here is the obvious ones, i.e. do you use a UTC date or a local date? Will the date be misinterpreted (will someone think it means local time when it means UTC)? As some of the others have suggested this might be better used in a surrogate/composite key instead? It might be better for your performance to use it in a key or index other than the Primary Key.
[Side note] This kind of reminds me of the theory behind a (1) COMB (combined GUID) although the idea here was to create a unique ID for a PK which SQL Server better indexed/required less index rebuilding, rather than to add any meaningful date/time value to a row.
(1) [http://www.informit.com/articles/article.aspx?p=25862&seqNum=7]
Dates make perfectly good primary keys, provided that they make sense as part of the natural key. I would use a date in tables like:
holiday_dates (hol_date date)
employee_salary (employee_id integer, sal_start_date date)
(What would be the point of adding the surrogate employee_salary_id above?)
For some tables, a date could be used but something else makes more sense as the primary key, e.g.:
hotel_room_booking (booking_reference)
We could have used (room_no, booking_from_date) or (room_no, booking_to_date), but a reference is more useful for communicating with the client etc. We might makes these into UNIQUE constraints, but in fact we need a more complex "no overlap" check for these.
Date as the sole or first component of a primary key causes performance problems on tables with high insert. (Table will need to be rebalanced frequently).
Often causes an issue if more then one are inserted per Date.
In most situations I consider this a bad smell, and would advise against it.
Nothing particulary wrong with this but as other posters have noted you could get into problems with time zones and locals. Also you can end up with lots of DATE() functions obfusticating your SQL.
If it is something like inventory at end of day as previously mentioned, you could perhaps consider an eight character text field like "20081202" as the second part of the primary key. This avoids the time zone locale problems and is easy enough to convert into a real date if you need to.
Remember the primary key has two functions to uniquly identify a record and to enforce uniqueness. Surrogate primary keys do niether.
It might be hard to refer to. I came across _ID + _Date as a composite PK. This composite key was also a reference/FK in another table.
Firstly it was purely confusing as there was _ID that suggested a non-composite key.
Secondly Inserts to the main table were done with SYSDATE and one needed to figure out precise time that was in that SYSDATE. You need to be precise about time that is in it when you refer to it. Otherwise it will not work...
Using the date as part of the primary key could make joins on the table significantly slower. I would prefer a surrogate key and then a unique index on the date if need be.