Classic database table design would include an tableId int index(1,1) not null which results in an auto-increment int32 id field.
However, it could be useful to give these numbers some meaning, and I wanted to know what people thought about using a Char(4) field for an table containing enumerables.
Specifically I was thinking of a User Role table which had data like;
"admn" - "Administrator"
"edit" - "Editor".
I could then reference these 'codes' in my code.
Update
It makes more sense when writing code to see User.IsInRole("admin") rather than User.IsInRole(UserRoles.Admin) where Admin is an int that needs to be updated/synchronised if you ever rebuild your database.
An id field (not associated with the data) is called a surrogate key. They have their advantages and disadvantages. You can see a list of those on this Wikipedia article. Personally I feel that people overuse them and have forgotten (or have never learned) how to properly normalise a database structure.
I always tend to use a surrogate primary key in my tables.
That is, a key that has no meaning in the business domain. A primary key is just an administrative piece of data that is required by the database ...
What would be the advantage of using 'admn' as primary key in this case ?
No. No, no, no, no, and no.
Keys are not data. Keys do not have meaning. That way when meaning changes, keys do not change.
Keys do not have encoded meaning. Encoded meaning is not even potentially possibly maybe useful unless you have an algorithm for decoding it.
Since there's no way to get from "admn" to "Aministrator" without a lookup, and since the real meaning, "Administrator" sits right next to the SEKRET ENKODED "useful" key, why would I ever look at the key instead of the real data right next to it in the table?
If you want an abbreviated form, an enum-like name, or what have you, then call it that and realize it's data. That's perfectly acceptable. create table( id int not null primary key, abbv char(4), name varchar(64));
But it's not a key, it doesn't hash like a integer key, it takes up four character compares and a look for the null terminator to compare it to "edtr", as opposed to one subtraction to compare two integers. There's no decent way to generate a next key: what's next in the sequence ('admn', 'edtr', ?)?
So you've lost generate-ability, easy comparison, possibly size (if you could have used, day, a tinyint as your key), and all for an arbitrary value that's of no real use.
Use a synthetic key. If you really need an abbreviation, make that an attribute column.
A primary key is better if it's never going to change. You can change a primary key as long as you update all references to it but it's a bit of a pain.
Sometimes there's no natural non-changing column in a table and, in that case, a surrogate is useful.
But, if you have a natural non-changing value (like an employee ID that's never recycled or a set of roles that you never expect to change), it's better to use that.
Otherwise, you're introducing complexity to cater for something with a minuscule chance of happening.
That's only my opinion, my name isn't Codd or Date, so don't take this as gospel.
I think the answer is in your post. Your example of User.IsInRole("admin") will always return false as you have a primary key of char(4) and used "admn" as key for the administrator.
I would go for a surrogate Primary key which will never ever change and have the option for a 'functional primary key' to query certain roles which are used hardcoded in the code.
A key should preferrably not have any special meaning in itself. Conditions tend to change, so you may have to change a key if it's meaning changes, and changing keys is not something that you want to do.
Also, when the amount of information that you can put in the key is so limited, it's not much point of having it there.
You are comparing auto-increment vs. the fixed char key, but in that scenario you don't want an auto-incremented id.
There are different routes to go. One is to use an enum, that maps to int ids. These are not auto incremented and are the primary key of the table.
Hard code => database references are always a bad idea! (Except you set them before application start etc.)
Beside this: Should mappings from admn=>administrator really be done in the database?
And you can only store 23^4-(keywords) entries with a varchar4 with a LATIN 23char alphabet.
If you use a non-related number as primary key it's called Surrogate key and a class of person believe it's the best practice.
http://en.wikipedia.org/wiki/Surrogate_key
If you use "admn" as primary key then it's called Natural key and a different class of developers believe it's the best practice.
https://en.wikipedia.org/wiki/Natural_key
Both of them are never going to agree. It's a age old religious war (Surrogate Key vs Natural key). Therefore use what you are most comfortable with. Just know that there are a many that supports your view for everyone that disagrees --- no matter which one you chose.
Related
My question is more or less the opposite of this one: Why would one ever want to bother finding a natural primary key in a relation when using a sequence as a surrogate seems so much easier.
BradC mentioned in his answer to a related question that the criteria for choosing a primary key are uniqueness, irreductibility, simplicity, stability and familiarity. It looks to me like using a sequence sacrifices the last criterion in order to provide an optimal solution for the first four.
If I hold those criteria to be correct, I can reformulate my question as: In which circumstances would one ever consider it advantageous to complicate one's life by looking for a unique, irreductible, simple and stable key that is also familiar?
To get a meaningful value from a lookup table without doing unnecessary joins.
Example case: garments references a lookup table of colors, which has an auto-increment primary key. Getting the name of the color requires a join:
SELECT c.color
FROM garments g
JOIN colors c USING (color_id);
Simpler example: the colors.color itself is the primary key of that table, and therefore it's the foreign key column in any table that references it.
SELECT g.color
FROM garments g
The answer is data integrity. Instances of entities in the business domain outside the database are by definition identifiable things. If you fail to give them external, real world identifiers in the database then that database stands little chance of modelling reality correctly.
A natural key[1] is what ensures facts in the database are identifiable with actual things in the reality you are trying to model. They are the means which users rely on when they act on and update the data in the database. The constraints that enforce those keys are an implementation of business rules. If your database is to model the business domain accurately then natural keys are not just desirable but essential. If you doubt that then you haven't done enough business analysis. Just ask your customers how they think their business would operate if they were left looking at screens full of duplicate data!
[1] I recommend calling them business keys or domain keys rather than natural keys. Those are far more appropriate and less overloaded terms even though they mean exactly the same thing.
You generally need to identify what the unique key on the data is anyway, as you still need to be able to ensure that the data is not duplicated.
The strength of the synthetic key is that it allows the values of the unique natural key to be modifiable in future, with child records not needing to be updated.
So you're not really skipping the "identify the key" part of the design by using a synthetic primary key, you're just insulating yourself from the possibility of the values changing.
Below are the benefits of using a natural primary key:
In case you need to have a unique constraint on any column then making it primary key will fulfill the need for that,if you aren't suppose to receive any null value into that.So, anyways it's saving your cost of 1 extra key.
In some RDBMS, the key you are declaring as primary key is automatically creating a btree index on that column and if you make a natural primary key based on your access pattern then it is like Icing on the cake because now you are making two shots with one stone. Saving cost of an extra index and making your queries faster by having that meaningful primary key in where clause.
Last but not least ,you will be able to save space of one extra column/key/index.
What is your choice for primary key in tables that represent a person (like Client, User, Customer, Employee etc.)? My first choice would be an social security number (SSN). However, using SSN has been discouraged because of privacy concerns and different regulations. SSN can change during person lifetime, so that is another reason against it.
I guess that one of the functions of well chosen natural primary key is to avoid duplication. I do not want a person to be registered twice in the database. Some surrogate or generated primary key does not help in avoiding duplicate entries. What is the best way to approach this?
What is the best way to guarantee uniqueness in your application for person entity and can this be handled on database level with primary key or uniqueness constraint?
I don't know which Database engine you are using, but (at least with MySQL -- see 7.4.1. Make Your Data as Small as Possible), using an integer, the shortest possible, is generally considered best for performances and memory requirements.
I would use an integer, auto_increment, for that primary key.
The idea being :
If the PK is short, it helps identifying each row (it's faster and easier to compare two integers than two long strings)
If a column used in foreign keys is short, it'll require less memory for foreign keys, as the value of that column is likely to be stored in several places.
And, then, set a UNIQUE index on an other column -- the one that determines unicity -- if that's possible and/or necessary.
Edit: Here are a couple of other questions/answers that might interest you :
What’s the best practice for Primary Keys in tables?
How do you like your primary keys?
Should I have a dedicated primary key field?
Use item specific prefixes and autonumber for primary keys?
As mentioned above, use an auto-increment as your primary key. But I don't believe this is your real question.
Your real question is how to avoid duplicate entries. In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other. (One might be a foreigner visiting the country).
However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication. Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial. It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).
Of course, if you can get the SSN/SIN then use that to determine uniqueness.
What attributes are available to you? Which ones does your application care about ? For example no two people can be born at exactly the same second at exactly the same place, but you probably don't have access to that data at that level of accuracy! So you need to decide, from the attributes you intend on modeling, which ones are sufficient to provide an acceptable level of data integrity. Whatever you choose, you're right in focusing on the data integrity aspects (preventing insertion of multiple rows for the same person) of your selection.
For Joins/Foreign Keys in other tables, it is best to use a surrogate key.
I've grown to consider the use of the word Primary Key as a misnomer, or at best, confusing. Any key, whether you flag it as Primary Key, Alternate Key, Unique Key, or Unique Index, is still a Key, and requires that every row in the table contain unique values for the attributes in the key. In that sense, all keys are equivilent. What matters more (Most), is whether they are natural keys (dependant on meaningful real- domain model data attributes), or surrogates (Independendant of real data attributes)
Secondly, what also matters is what you use the key for.. Surrogate keys are narrow and simple and never change (No reason to - they don't mean anything) So they are a better choice for joins or for foreign Keys in other dependant tables.
But to ensure data integrity, and prevent insertion of multiple rows for the same domain entity, they are totally useless... For that you need some kind of Natural Key, chosen from the data you have available, and which your application is modeling for some purpose.
The key does not have to be 100% immutable. If (as an example), you use Name and Phone Number and Birthdate, for example, even if a person changes their name, or their phone number, you can simply change the value in the table. As long as no other row already has the new values in their key attributes, you are fine.
Even if the key you select only works in 99.9% of the cases, (say you are unlucky enough to run into two people with the same name and phone number and were coincidentally born the same day), well, at least 99.9% of your data will be guaranteed to be accurate and consistent - and you can for example, just add time to their birthdate to make them unique, or add some other attribute to the key to distinquish them. As long as you don't have to update data values in Foreign Keys throughout your database because of the change, (since you are not using this key as a FK elsewhere) you are not facing any significant issue.
Use an autogenerated integer primary key, and then put a unique constraint on anything that you believe should be unique. But SSNs are not unique in the real world so it would be a bad idea to put a uniqueness constraint on this column unless you think turning away customers because your database won't accept them is a good business model.
I prefer natural keys, but a table person is a lost case. SSNs are not unique and not everybody has one.
I'd recommend a surrogate key. Add all the indexes you need for other candidate keys, but keeping business logic out of the key is my recommendation.
I prefer natural keys, when they can be trusted.
Unless you are running a bank or something like that, there is no reason for your clients and users to provide you with a valid SSN, or even necessarily to have one. Thus, for business reasons, you are forced to distrust SSN in the case you outline. A similar argumant would hold for any given natural key to "persons".
You have no choice but to assign an artificial (Read "surrogate") key. It might as well be an integer. Make sure it's big enough integer so you aren't going to need toexpand it real soon.
To add to #Mark and #Pascal (autoincrement integers are your best bet) -- SSN's are usefull and should be modelled correctly. Security concerns are part of application logic. You can normalize them into a separate table, and you can make them unique by providing a date-issued field.
p.s., to those who disagree with the `security in application' point, an enterprise DB will have a granular ACL model; so this won't be a sticking point.
We are trying to come up with a numbering system for the asset system that we are creating, there has been a few heated discussions on this topic in the office so I decided to ask the experts of SO.
Considering the database design below what would be the better option.
Example 1: Using auto surrogate keys.
================= ==================
Road_Number(PK) Segment_Number(PK)
================= ==================
1 1
Example 2: Using program generated PK
================= ==================
Road_Number(PK) Segment_Number(PK)
================= ==================
"RD00000001WCK" "00000001.1"
(the 00000001.1 means it's the first segment of the road. This increases everytime you add a new segment e.g. 00000001.2)
Example 3: Using a bit of both(adding a new column)
======================= ==========================
ID(PK) Road_Number(UK) ID(PK) Segment_Number(UK)
======================= ==========================
1 "RD00000001WCK" 1 "00000001.1"
Just a bit of background information, we will be using the Road Number and Segment Number in reports and other documents, so they have to be unique.
I have always liked keeping things simple so I prefer example 1, but I have been reading that you should not expose your primary keys in reports/documents. So now I'm thinking more along the lines of example 3.
I am also leaning towards example 3 because if we decide to change how our asset numbering is generated it won't have to do cascade updates on a primary key.
What do you think we should do?
Thanks.
EDIT: Thanks everyone for the great answers, has help me a lot.
This is really a discussion about surrogate (also called technical or synthetic) vs natural primary keys, a subject that has been extensively covered. I covered this in Database Development Mistakes Made by AppDevelopers.
Natural keys are keys based on
externally meaningful data that is
(ostensibly) unique. Common examples
are product codes, two-letter state
codes (US), social security numbers
and so on. Surrogate or technical
primary keys are those that have
absolutely no meaning outside the
system. They are invented purely for
identifying the entity and are
typically auto-incrementing fields
(SQL Server, MySQL, others) or
sequences (most notably Oracle).
In my opinion you should always
use surrogate keys. This issue has
come up in these questions:
How do you like your primary keys?
What’s the best practice for Primary Keys in tables?
Which format of primary key would you use in this situation.
Surrogate Vs. Natural/Business Keys
Should I have a dedicated primary key field?
Auto number fields are the way to go. If your keys have meaning outside your database (like asset numbers) those will quite possibly change and changing keys is problematic. Just use indexes for those things into the relevant tables.
I would personally say keep it simple and stay with an autoincremented primary key. If you need something more "Readable" in terms of display in the program, then possibly one of your other ideas, but I think that is just adding unneeded complexity to the primary key field.
I'm also very strongly in the "don't use primary keys as meaningful data" camp. Every time I have contravened that policy it has ended in tears. Sooner or later the meaningful data needs to change and if that means you have to change a primary key it can get painful. The primary key will probably be used in foreign key constraints and you can spend ages trying to sort it all out just to make a simple data change.
I always use GUIDs/UUIDs for my primary keys in every table I ever create but that's just personal preference serials or such are also good.
Don't put meaning into your PK fields unless...
It is 100% completely impossible that
the value will never change and that
No two people would ever reasonably
argue about which value should be
used for a particular row.
Go with option one and format the value in the app to look like option two or three when it is displayed.
I think the important thing to remember here is that each table in your database/design might have multiple keys. These are the Candidate Keys.
See wikipedia entry for Candidate Keys
By definition, all Candidate Keys are created equal. They are each unique identifiers for the table in question.
Your job then is to select the best candidate from the pool of Candidate Keys to serve as the Primary Key. The Primary Key will be used by other tables to establish the relational constraints, but you are free to continue using Candidate Keys to query the table.
Because Primary Keys are referenced by other structures, and therefore used in join operations, the criteria for Primary Key selection boils down to the following for me (in order of importance):
Immutable/Stable - Primary Key values should not change. If they do, you run the risk of introducing update anomolies
Not Null - most DBMS platforms require that the Primary Key attribute(s) are not null
Simple - simple datatypes and values for physical storage and performance. Integer values work well here, and this is the datatype of choice for most surrogate/auto-gen keys
Once you've identified the Candidate Keys, the criteria above can be used to select the Primary Key. If there is not a "Natural" Candidate Key meets the criteria, then a Surrogate Key that does meet the criteria can be created and used as mentioned in other answers.
Follow the Don't Use policy.
Some problems you can run into:
You need to generate keys from more than one host.
Someone will want to reserve contiguous numbers to use together.
How meaningful will people want it to be? Wars are fought over this, and you're in the first skirmish of one already. "It's already meaningful, and if we just add two more digits we can ..." i.e. you're establishing a design style that will (should) be extensible.
If you are concatenating the two, you're doing typecasts which can mess up your query Optimizer.
You'll need to reclassify roads, and redefine their boundaries (i.e. move the roads), which implies changing the primary key and maybe losing links.
There are workarounds for all this, but this is the kind of issue where workarounds proliferate and get out of control. And it doesn't take more than a couple to get beyond "Simple".
As mentioned before, keep your internal primary keys as just keys, whatever the most optimal datatype is on your platform.
However you do need to let the numbering system argument be fought out, as this is actually a business requirement, and perhaps let's call it an identification system for the asset.
If there is only going to be one identifier, then add it as a column to the main table. If there are likely to be many identification systems (and assets usually have many), you'll need two more tables
Identifier-type table Identifier-cross-ref table
type-id ------------> type-id (unique
type-name identifier-string key)
internal-id
That way different people who need to access the asset can identify in their own way. For example the server team will identify a server differently from the network team and different again from project management, accounts, etc.
Plus, you get to go to all the meetings where everyone argues with each other.
Another thing to keep in mind is that if you're importing alot of data into this system, you may find out that things like Road_Number are not as unique as you thought, and there may be operational roadblocks to fixing the problem (repainting road signs, etc.) .
While natural keys may have great meaning to the business users, if you do not have the agreement that those keys are sacred and should not be altered, you will more than likely be pulling your hair out while maintaining a database where the "product codes have to be changed to accommodate the new product line the company acquired." You need to protect the RI of your data, and integers as primary keys with auto-increment are the best way to go. Performance is also better when indexing and traversing integers than char columns.
While not appropriate as primary keys, natural keys are very appropriate for user consumption and you can enforce uniques via an index. They bring a context to the data that will make it easier for all parties to understand. Also, in the advent that you need to reload data, the natural keys can help verify that your lookups are still valid.
I would go with the surrogate key, but you may want to have a computed column that "formats" the surrogate key into a more "readable" value if that improves your reporting. The computed colum could produce example 2 from the surrogate key for instance for display purposes.
I think the surrogate key route is the way to go and the only exceptions that I make for it are join tables, where the primary key could be composed of the foreign key references. Even in these cases I'm finding that having a surrogate primary key is more useful than not.
I suspect that you really should use option #3, as many here have already said. Surrogate PKs (either Integers or GUIDs) are good practice, even if there are adequate business keys. Surrogates will reduce maintenance headaches (as you yourself have already noted).
That being said, something you may want to consider is whether or not your database is:
focused on data maintenance and transactional processing (i.e. Create/Update/Delete operations)
geared towards analysis and reporting (i.e. Queries)
In other words, are the users concerned with maintaining active data or querying largely static data to find answers?
If you are heavily focused on building an analysis and reporting DB (e.g. a data warehouse/mart) that is exposed to technical business users (e.g. report designers) who have a good grasp of the business vocabulary, then you might want to consider using natural keys based on meaningful business values. They help reduce query complexity by eliminating the need for complex joins and help the user focus on their task, not fighting the database structure.
Otherwise you're probably focused on a full CRUD DB that has to cover all the bases to some degree - this is the vast majority of situations. In which case, go with your option #3. You can always optimize for queryability in the future but you'll be hard pressed to retrofit for maintainability.
I hope you will agree with me that every design element should have single purpose.
Question is what do you think is purpose of PK? If it is to identify unique record in a table, then surrogate keys wins without much trouble. This is simple and straight.
As far as new columns in option 3 are concerned, you should check if these can be calculated (best would be to do calculation in model layer so that they can be changed easily than if calculation done in RDBMS) without too much of performance penalty from other elements. For example, you can store segment number and road number in corresponding tables and then use them to generate "00000001.1". This will allow to change asset numbering on-the-fly.
First off, option 2 is the absolute worst option. As an Index, it's a string, and that makes it slow. And it's generated based on business rules - which can change and cause a rather large headache.
Personally, I always use a separate primary key column; and I always use a GUID. Some developers prefer a simple INT over a GUID for reasons of hard-drive space. However, if the situation arises where you need to merge two databases, GUIDs will almost never collide (whereas INTs are guaranteed to collide).
Primary Keys should NEVER be seen by the user. Making it readable to the user should not be a concern. Primary Keys SHOULD be used to link with Foreign Keys. This is their purpose. The value should be machine readable and, once created, never changed.
Is there a performance gain or best practice when it comes to using unique, numeric ID fields in a database table compared to using character-based ones?
For instance, if I had two tables:
athlete
id ... 17, name ... Rickey Henderson, teamid ... 28
team
teamid ... 28, teamname ... Oakland
The athlete table, with thousands of players, would be easier to read if the teamid was, say, "OAK" or "SD" instead of "28" or "31". Let's take for granted the teamid values would remain unique and consistent in character form.
I know you CAN use characters, but is it a bad idea for indexing, filtering, etc for any reason?
Please ignore the normalization argument as these tables are more complicated than the example.
I find primary keys that are meaningless numbers cause less headaches in the long run.
Text is fine, for all the reasons you mentioned.
If the string is only a few characters, then it will be nearly as small an an integer anyway. The biggest potential drawback to using strings is the size: database performance is related to how many disk accesses are needed. Making the index twice as big, for example, could create disk-cache pressure, and increase the number of disk seeks.
I'd stay away from using text as your key - what happens in the future when you want to change the team ID for some team? You'd have to cascade that key change all through your data, when it's the exact thing a primary key can avoid. Also, though I don't have any emperical evidence, I'd think the INT key would be significantly faster than the text one.
Perhaps you can create views for your data that make it easier to consume, while still using a numeric primary key.
I'm just going to roll with your example. Doug is correct when he says that text is fine. Even for a medium sized (~50gig) database having a 3 letter code be a primary key won't kill the database. If it makes development easier, reduces joins on the other table and it's a field that users would be typing in...I say go for it. Don't do it if it's just an abbreviation that you show on a page or because it makes the athletes table look pretty. I think the key is the question "Is this a code that the user will type in and not just pick from a list?"
Let me give you an example of when I used a text column for a key. I was making software for processing medical claims. After the claim got all digitized a human had to look at the claim and then pick a code for it that designated what kind of claim it was. There were hundreds of codes...and these guys had them all memorized or crib sheets to help them. They'd been using these same codes for years. Using a 3 letter key let them just fly through the claims processing.
I recommend using ints or bigints for primary keys. Benefits include:
This allows for faster joins.
Having no semantic meaning in your primary key allows you to change the fields with semantic meaning without affecting relationships to other tables.
You can always have another column to hold team_code or something for "OAK" and "SD". Also
The standard answer is to use numbers because they are faster to index; no need to compute a hash or whatever.
If you use a meaningful value as a primary key you'll have to update it all through you're database if the team name changes.
To satisfy the above, but still make the database directly readable,
use a number field as the primary key
immediately create a view Athlete_And_Team that joins the Athlete and Team tables
Then you can use the view when you're going through the data by hand.
Are you talking about your primary key or your clustered index? Your clustered index should be the column which you will use to uniquely identify that row by most often. It also defines the logical ordering of the rows in your table. The clustered index will almost always be your primary key, but there are circumstances where they can be differant.
What are the pros/cons for including a date field as a part of a primary key?
Consider a table of parts inventory -- if you want to store the inventory level at the end of each day then a composite primary key on part_id and date_of_day would be fine. You might choose to make that a unique key and add a synthetic primary key, particularly if you have one or more tables referencing it with a foreign key constraint, but that aside, no problem.
So there's nothing necessarily wrong with it, but like any other method it can be used incorrectly as in Patrick's example.
Edit: Here's another comment to add.
I'm reminded of something I wrote a while ago on the subject of whether date values in databases really were natural or synthetic. The readable representation of a date as "YYYY-MM-DD" is certainly natural, but internally in Oracle this is stored as a numeric that just represents that particular date/time to Oracle. We can choose and change the representation of that internal value at any time (to different readable formats, or to a different calendar system entirely) without the internal value losing its meaning as that particular date and time. I think on that basis, a DATE data type lies somewhere between natural and synthetic.
I am ok with it being part of the key, but would add that you should also have an auto-incrementing sequence number be a part of the PK, and enforce that any date is written to the database as UTC, with the downstream systems than converting to local time.
A system that I worked in decided that it would be a grand idea to have an Oracle trigger write to a database whenever another table was touched, and make the sysdate be part of the primary key with no sequence number. Only problem is that if you run an update query that hits the row more than once per second, it breaks the primary key on the table that is recording the change.
if you have already decided to use a 'natural' primary key, then question is: is the date a necessary part of the primary key, or not - pros/cons are irrelevant!
There are some questions I'd ask about using a date as part of the primary key.
Does the date include the time portion? This makes things tricky because time includes time zones and daylight savings. This doesn't alter the date/time value, but may produce unexpected results in terms of sorting or retrieving values based upon a query.
I'm a big believer in the use of surrogate keys (i.e. use a sequence column as the primary key) rather than natural keys (like using a date).
A slight con would be that it's not as elegant a handle as some other identifiers
(e.g. saying to a colleague please can you look at record 475663 is a bit easier than saying please can you look at 2008-12-04 19:34:02)
There is also the risk of confusion over different date format in different locales
(e.g. 4th March 2008 - 4/3/2008 in Europe, 3/4/2008 in USA)
(My preference is always to use a seperate key column)
As always.. It depends.
What is your objective of including a date/time column in a PK? Is it to provide additional information about a record without having to actually select the row?
The main problem I can foresee here is the obvious ones, i.e. do you use a UTC date or a local date? Will the date be misinterpreted (will someone think it means local time when it means UTC)? As some of the others have suggested this might be better used in a surrogate/composite key instead? It might be better for your performance to use it in a key or index other than the Primary Key.
[Side note] This kind of reminds me of the theory behind a (1) COMB (combined GUID) although the idea here was to create a unique ID for a PK which SQL Server better indexed/required less index rebuilding, rather than to add any meaningful date/time value to a row.
(1) [http://www.informit.com/articles/article.aspx?p=25862&seqNum=7]
Dates make perfectly good primary keys, provided that they make sense as part of the natural key. I would use a date in tables like:
holiday_dates (hol_date date)
employee_salary (employee_id integer, sal_start_date date)
(What would be the point of adding the surrogate employee_salary_id above?)
For some tables, a date could be used but something else makes more sense as the primary key, e.g.:
hotel_room_booking (booking_reference)
We could have used (room_no, booking_from_date) or (room_no, booking_to_date), but a reference is more useful for communicating with the client etc. We might makes these into UNIQUE constraints, but in fact we need a more complex "no overlap" check for these.
Date as the sole or first component of a primary key causes performance problems on tables with high insert. (Table will need to be rebalanced frequently).
Often causes an issue if more then one are inserted per Date.
In most situations I consider this a bad smell, and would advise against it.
Nothing particulary wrong with this but as other posters have noted you could get into problems with time zones and locals. Also you can end up with lots of DATE() functions obfusticating your SQL.
If it is something like inventory at end of day as previously mentioned, you could perhaps consider an eight character text field like "20081202" as the second part of the primary key. This avoids the time zone locale problems and is easy enough to convert into a real date if you need to.
Remember the primary key has two functions to uniquly identify a record and to enforce uniqueness. Surrogate primary keys do niether.
It might be hard to refer to. I came across _ID + _Date as a composite PK. This composite key was also a reference/FK in another table.
Firstly it was purely confusing as there was _ID that suggested a non-composite key.
Secondly Inserts to the main table were done with SYSDATE and one needed to figure out precise time that was in that SYSDATE. You need to be precise about time that is in it when you refer to it. Otherwise it will not work...
Using the date as part of the primary key could make joins on the table significantly slower. I would prefer a surrogate key and then a unique index on the date if need be.