Related
This might be too subjective, but it's been puzzling me some time.
If you have a Fact table that allows duplicates with 10 dimensions that do not, do you really need a primary key?
Why Are There Duplicates?
It's a bit tricky, but ideally each duplicate is actually valid. There is just not a unique identifier to separate them from the source system recording the record. We don't own that system so there is no way to ever change it.
Data
The data is in batch and only include the previous days worth of records. Therefore, in the event of a republish. We just drop the entire days worth of records and republish the new day of records without the use of a primary key.
This is how I would fix bad data.
Generate A Primary Key Already
I can, but if it's never used or have anyway to validate if the duplicate is legit, why do it?
SQL Server database tables do not require a primary key.
A database engine may well create a primary key in the background though.
Yes, SQL Server don't need primary key. Mostly, it needs in CLUSTERED index. Because, if you have another NONCLUSTERED indexes on this table, every of them will use CLUSTERED index for pointing data. So, primary key is good example of clustered key. And if it's short, and you have another indexes - it's reason to create it.
I am trying to create a database for hotel reservation system.
In that the Date, Reserved Time (breakfast, lunch or dinner) and Table Number all 3 candidate keys become a composite primary key. In access it's possible to make all these 3 as primary key but when I'm trying to make relationship (Ex:with customer detail table) it's impossible due to there is no unique primary key in this table.
Is there any solution for this?
https://drive.google.com/file/d/0B5_8M-VhW5zoZ3ExRUlvakU4bzQ/view?usp=sharing
Sorry that i don't have privileges to directly add image.
Please be kind enough to refer this link.
I would recommend that you use an AutoNumber field for the primary key instead of having a composite key. Then you don't have to deal with issues like the Date field being changed in an existing record breaking relationships to other tables.
In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.
Classic database table design would include an tableId int index(1,1) not null which results in an auto-increment int32 id field.
However, it could be useful to give these numbers some meaning, and I wanted to know what people thought about using a Char(4) field for an table containing enumerables.
Specifically I was thinking of a User Role table which had data like;
"admn" - "Administrator"
"edit" - "Editor".
I could then reference these 'codes' in my code.
Update
It makes more sense when writing code to see User.IsInRole("admin") rather than User.IsInRole(UserRoles.Admin) where Admin is an int that needs to be updated/synchronised if you ever rebuild your database.
An id field (not associated with the data) is called a surrogate key. They have their advantages and disadvantages. You can see a list of those on this Wikipedia article. Personally I feel that people overuse them and have forgotten (or have never learned) how to properly normalise a database structure.
I always tend to use a surrogate primary key in my tables.
That is, a key that has no meaning in the business domain. A primary key is just an administrative piece of data that is required by the database ...
What would be the advantage of using 'admn' as primary key in this case ?
No. No, no, no, no, and no.
Keys are not data. Keys do not have meaning. That way when meaning changes, keys do not change.
Keys do not have encoded meaning. Encoded meaning is not even potentially possibly maybe useful unless you have an algorithm for decoding it.
Since there's no way to get from "admn" to "Aministrator" without a lookup, and since the real meaning, "Administrator" sits right next to the SEKRET ENKODED "useful" key, why would I ever look at the key instead of the real data right next to it in the table?
If you want an abbreviated form, an enum-like name, or what have you, then call it that and realize it's data. That's perfectly acceptable. create table( id int not null primary key, abbv char(4), name varchar(64));
But it's not a key, it doesn't hash like a integer key, it takes up four character compares and a look for the null terminator to compare it to "edtr", as opposed to one subtraction to compare two integers. There's no decent way to generate a next key: what's next in the sequence ('admn', 'edtr', ?)?
So you've lost generate-ability, easy comparison, possibly size (if you could have used, day, a tinyint as your key), and all for an arbitrary value that's of no real use.
Use a synthetic key. If you really need an abbreviation, make that an attribute column.
A primary key is better if it's never going to change. You can change a primary key as long as you update all references to it but it's a bit of a pain.
Sometimes there's no natural non-changing column in a table and, in that case, a surrogate is useful.
But, if you have a natural non-changing value (like an employee ID that's never recycled or a set of roles that you never expect to change), it's better to use that.
Otherwise, you're introducing complexity to cater for something with a minuscule chance of happening.
That's only my opinion, my name isn't Codd or Date, so don't take this as gospel.
I think the answer is in your post. Your example of User.IsInRole("admin") will always return false as you have a primary key of char(4) and used "admn" as key for the administrator.
I would go for a surrogate Primary key which will never ever change and have the option for a 'functional primary key' to query certain roles which are used hardcoded in the code.
A key should preferrably not have any special meaning in itself. Conditions tend to change, so you may have to change a key if it's meaning changes, and changing keys is not something that you want to do.
Also, when the amount of information that you can put in the key is so limited, it's not much point of having it there.
You are comparing auto-increment vs. the fixed char key, but in that scenario you don't want an auto-incremented id.
There are different routes to go. One is to use an enum, that maps to int ids. These are not auto incremented and are the primary key of the table.
Hard code => database references are always a bad idea! (Except you set them before application start etc.)
Beside this: Should mappings from admn=>administrator really be done in the database?
And you can only store 23^4-(keywords) entries with a varchar4 with a LATIN 23char alphabet.
If you use a non-related number as primary key it's called Surrogate key and a class of person believe it's the best practice.
http://en.wikipedia.org/wiki/Surrogate_key
If you use "admn" as primary key then it's called Natural key and a different class of developers believe it's the best practice.
https://en.wikipedia.org/wiki/Natural_key
Both of them are never going to agree. It's a age old religious war (Surrogate Key vs Natural key). Therefore use what you are most comfortable with. Just know that there are a many that supports your view for everyone that disagrees --- no matter which one you chose.
What are the pros/cons for including a date field as a part of a primary key?
Consider a table of parts inventory -- if you want to store the inventory level at the end of each day then a composite primary key on part_id and date_of_day would be fine. You might choose to make that a unique key and add a synthetic primary key, particularly if you have one or more tables referencing it with a foreign key constraint, but that aside, no problem.
So there's nothing necessarily wrong with it, but like any other method it can be used incorrectly as in Patrick's example.
Edit: Here's another comment to add.
I'm reminded of something I wrote a while ago on the subject of whether date values in databases really were natural or synthetic. The readable representation of a date as "YYYY-MM-DD" is certainly natural, but internally in Oracle this is stored as a numeric that just represents that particular date/time to Oracle. We can choose and change the representation of that internal value at any time (to different readable formats, or to a different calendar system entirely) without the internal value losing its meaning as that particular date and time. I think on that basis, a DATE data type lies somewhere between natural and synthetic.
I am ok with it being part of the key, but would add that you should also have an auto-incrementing sequence number be a part of the PK, and enforce that any date is written to the database as UTC, with the downstream systems than converting to local time.
A system that I worked in decided that it would be a grand idea to have an Oracle trigger write to a database whenever another table was touched, and make the sysdate be part of the primary key with no sequence number. Only problem is that if you run an update query that hits the row more than once per second, it breaks the primary key on the table that is recording the change.
if you have already decided to use a 'natural' primary key, then question is: is the date a necessary part of the primary key, or not - pros/cons are irrelevant!
There are some questions I'd ask about using a date as part of the primary key.
Does the date include the time portion? This makes things tricky because time includes time zones and daylight savings. This doesn't alter the date/time value, but may produce unexpected results in terms of sorting or retrieving values based upon a query.
I'm a big believer in the use of surrogate keys (i.e. use a sequence column as the primary key) rather than natural keys (like using a date).
A slight con would be that it's not as elegant a handle as some other identifiers
(e.g. saying to a colleague please can you look at record 475663 is a bit easier than saying please can you look at 2008-12-04 19:34:02)
There is also the risk of confusion over different date format in different locales
(e.g. 4th March 2008 - 4/3/2008 in Europe, 3/4/2008 in USA)
(My preference is always to use a seperate key column)
As always.. It depends.
What is your objective of including a date/time column in a PK? Is it to provide additional information about a record without having to actually select the row?
The main problem I can foresee here is the obvious ones, i.e. do you use a UTC date or a local date? Will the date be misinterpreted (will someone think it means local time when it means UTC)? As some of the others have suggested this might be better used in a surrogate/composite key instead? It might be better for your performance to use it in a key or index other than the Primary Key.
[Side note] This kind of reminds me of the theory behind a (1) COMB (combined GUID) although the idea here was to create a unique ID for a PK which SQL Server better indexed/required less index rebuilding, rather than to add any meaningful date/time value to a row.
(1) [http://www.informit.com/articles/article.aspx?p=25862&seqNum=7]
Dates make perfectly good primary keys, provided that they make sense as part of the natural key. I would use a date in tables like:
holiday_dates (hol_date date)
employee_salary (employee_id integer, sal_start_date date)
(What would be the point of adding the surrogate employee_salary_id above?)
For some tables, a date could be used but something else makes more sense as the primary key, e.g.:
hotel_room_booking (booking_reference)
We could have used (room_no, booking_from_date) or (room_no, booking_to_date), but a reference is more useful for communicating with the client etc. We might makes these into UNIQUE constraints, but in fact we need a more complex "no overlap" check for these.
Date as the sole or first component of a primary key causes performance problems on tables with high insert. (Table will need to be rebalanced frequently).
Often causes an issue if more then one are inserted per Date.
In most situations I consider this a bad smell, and would advise against it.
Nothing particulary wrong with this but as other posters have noted you could get into problems with time zones and locals. Also you can end up with lots of DATE() functions obfusticating your SQL.
If it is something like inventory at end of day as previously mentioned, you could perhaps consider an eight character text field like "20081202" as the second part of the primary key. This avoids the time zone locale problems and is easy enough to convert into a real date if you need to.
Remember the primary key has two functions to uniquly identify a record and to enforce uniqueness. Surrogate primary keys do niether.
It might be hard to refer to. I came across _ID + _Date as a composite PK. This composite key was also a reference/FK in another table.
Firstly it was purely confusing as there was _ID that suggested a non-composite key.
Secondly Inserts to the main table were done with SYSDATE and one needed to figure out precise time that was in that SYSDATE. You need to be precise about time that is in it when you refer to it. Otherwise it will not work...
Using the date as part of the primary key could make joins on the table significantly slower. I would prefer a surrogate key and then a unique index on the date if need be.