I have alphanumeric data with a max length of 20 characters. I'm going to store this data in a column with type NVARCHAR(20).
These data are CODES and must be unique, so I decided to make it a primary key column.
But, asking another question, someone has "suggested" me to use an INT column as primary key.
What do you think? An INT primary key and add a column with an UNIQUE constraint or my current design?
I think I'm adding a new column that I'm not going to use, because I need the NVARCHAR(20) column to search, and avoid duplicates. In other words, 99% of my where clause will have that NVARCHAR column.
I am a strong fan of numeric, synthetic primary keys. Something like the key you want can be declared unique and be an attribute of key.
Here are some reasons:
Numeric keys occupy 4 or 8 bytes and are of fixed length. This is more efficient for building indexes.
Numeric keys are often shorter than string keys. This saves space for foreign key references.
Synthetic keys are usually inserted using auto-incrementing columns. This let's you know the insert order. Note: In some applications, knowing the order may be a drawback, but those are unusual.
If the value of the unique string changes, you only have to change the value in one place -- rather than in every table with a foreign key reference. And, if you leave out a foreign key reference, then the database integrity is at risk.
If a row is identified by multiple keys, then a single numeric key is more efficient.
A synthetic key can help in maintaining security.
These are just guidelines -- your question is why synthetic numeric keys are a good idea. There are alternative issues. For instance, if space usage is a really big concern, for instance, then the additional space for a numeric key plus a unique index may overrule other concerns.
Related
I have unique contraint on 5 nullable columns that represent identifier of one row.
Is it okay to create unique key and create clustered index on it instead of primary key? I cannot use primary key on these columns because they are nullable, and i cannot create identity column because there are lot of deletes and inserts and it will make overflow on this identity column.
Yes, and there's an argument that this is actually "better" than a primary key, as the rule that a primary key column is non nullable is in many ways an artificial constraint.
If you make it a UNIQUE CLUSTERED INDEX then you get just about everything that a primary key brings to the table except the unwanted rule that the columns must be non nullable. However, they must still be unique, so you could only ever have one row where all five columns in your index are null for example.
So you can use your index when creating foreign key constraints, you will guarantee the order data is stored, and each row must be unique. However, the index probably won't be incredibly useful for querying, and, because it's going to be wide, and you said there's a lot of deletes/ inserts, it will have a tendency to fragment your data.
Personally, I would be tempted to make it a unique constraint, but not clustered. Then it will do the job of keeping non-unique data from being created.
You could then add a surrogate key and make this the primary key. I doubt you would ever "run out" (or "overflow"?) of numbers doing this.
So why would I use a surrogate key?
Your surrogate key will be much narrower, so less impact from fragmentation due to so many inserts/ updates/ deletes.
It's then useful if you need to extend your database. Say you only have one table, and this is always going to be the only table in the entire database. In this one scenario it would make sense to not bother with a surrogate key. It doesn't give you any value; it's just an unnecessary overhead.
However, let's assume that you have other tables hanging off your "main" table (the one with 5 columns forming a unique key). Adding a surrogate key here allows you to make any child tables with a single id that links back to the parent table. The alternative would be to enforce the addition of ALL five columns forming the unique (candidate) key every time you create a child table.
Now you have a narrow clustered index that actually serves a purpose, and the fragmentation will not be quite as bad as it would with five columns.
I am little confused about the primary key and ID in the database table.
I want to identify the rows in my table based on some logical id like pay_xyz123, pay_xyz124 , order_xyz etc. If I use these format as pk which is a string would it affect the performance.
Or should I use auto-increment numbers as pk and the id's like pay_xyz123 as unique key. What would be the best approach
Edit: the logical id's can be a long string say 15 characters
In general, I prefer having synthetic ids (i.e. auto-increment/serial/identity column) for such tables. This has some advantages:
You can update the entity id easily, because it is an attribute and not used for foreign key references.
Integers are (slightly) more efficient for indexing purposes.
The a synthetic id hides entity name information in the tables that reference the key.
It also allows things like soft deletes -- where deletion is by a flag rather than removing the row -- with an insert using the same id. Of course, you have to adjust the uniqueness constraint to allow this.
Of course, there is a slight overhead to storing the auto-incremented key. This increases the size of the base table. Usually string names are longer (as in your example), so this is more than offset by having a reduced length in the rows that refer to the entity.
If you are using your key for foreign keys into other tables, I personally would use a numeric auto increment id. You can still place an alternate index on your logical key and even a unique constraint on the logical key if the business rules warrant.
The downside, in my opinion, of using the logical key for foreign keys, is that if the logical key changes, then you have to update all of the foreign keys.
If there is no unique column can identify each row in the table,
then my primary key will be at least a set of two fields.
Is that correct?
If it is correct,then when I draw the Relationship Diagram, I have to underline the two attributes that formed the primary key?
Thankyou
Here is some terminology:
A superkey is a set of columns that, taken together, uniquely identify rows.
A candidate key (or just: "key") is a minimal1 superkey. Sometimes a key contains just one column, sometimes it contains several (in which case it is called "composite").
For practical reasons, we classify keys as either primary or alternate. One table has one primary key and zero or more alternate keys.
A key is "natural" if it arises from the intrinsic properties of data. In other words, it "means" something.
A key is "surrogate" if it doesn't have any meaning by itself - it is there only for identification purposes. It's typically implemented as an auto-incrementing integer, but there may be other strategies such as GUIDs (useful for replication). It is quite common for natural keys to be composite, but that almost never happens for surrogates.
If there are no "obvious" natural keys, the whole row can always act as a key2. However, this is rarely practical and in such cases you'll typically introduce a surrogate key just for the purpose of identifying rows.
Sometimes, but not always, it is useful to introduce a surrogate in addition to the existing natural key(s).
An ER diagram will clearly identify the PK3, whether it is natural or surrogate and whether it is composite or not. How exactly this will look like depends on a notation being used, but PK will typically be drawn in a graphically distinct manner and possibly prefixed with "PK".
1 I.e. if you were to remove any column from it, it would no longer be unique.
2 A database table is a physical representation of the mathematical concept of "relation". Since relation is set, there is no purpose in having two identical rows, so at the very least the whole row must be unique (an element is either in the set or isn't - it cannot be "twice" in the set, as opposed to multiset).
3 Assuming it not just entity-level so no attributes are show at all.
You are correct, after a fashion. Technically, a primary key and a unique key can be two distinct things. You can have a primary key on a table or entity uniquely identifying that entity and also. On the same table, you can have a unique key constraint which can then be used to ensure that no two rows, according to criteria chosen by you, end up having the same property. So you can have both a primary key and a unique constraint on the same table. Simply have a primary key column that will be autogenerated in your DB and then pick the two columns in your table that you want to use to enforce the unique key constraint
If you don't have primary key you can identify your datas but it's not performant.
And as best practise you use primary on your table.
The preference is to use auto increment column as primary key
What is the difference between Primary key And unique Key constraint?
What's the use of it??
Both are used to denote candidate keys for a table.
You can only have one primary key for a table so would just need to pick one if you have multiple candidates.
Either can be used in Foreign Key constraints. In SQL Server the Primary Key columns cannot be nullable. Columns used in Unique Key constraints can be.
By default in SQL Server the Primary Key will become the clustered index if it is created on a heap but it is by no means mandatory that the PK and clustered index should be the same.
A primary key is one which is used to identify the row in question. It might also have some meaning beyond that (if there was already a piece of "real" data that could serve) or it may be purely an implementation artefact (most IDENTITY columns, and equivalent auto-incremented values on other database systems).
A unique key is a more general case, where a key cannot have repeated values. In most cases people cannot have the same social security numbers in relation to the same jurisdiction (an international case could differ). Hence if we were storing social security numbers, then we would want to model them as unique, as any case of them matching an existing number is clearly wrong. Usernames generally must be unique also, so here's another case. External identifiers (identifiers used by another system, standard or protocol) tend to also be unique, e.g. there is only one language that has a given ISO 639 code, so if we were storing ISO 639 codes we would model that as unique.
This uniqueness can also be across more than one column. For example, in most hierarchical categorisation systems (e.g. a folder structure) no item can have both the same parent item and the same name, though there could be other items with the same parent and different names, and others with the same name and different parents. This multi-column capability is also present on primary keys.
A table may also have more than one unique key. E.g. a user may have both an id number and a username, and both will need to be unique.
Any non-nullable unique key can therefore serve as a primary key. Sometimes primary keys that come from the innate data being modelled are referred to as "natural primary keys", because they are a "natural" part of the data, rather than just an implementation artefact. The decision as to which to use depends on a few things:
Likelihood of change of specification. If we modelled a social security number as unique and then had to adapt to allow for multiple jurisdictions where two or more use a similar enough numbering system to allow for collisions, we likely need just remove the uniqueness constraint (other changes may be needed). If it was our primary key, we now also need to use a new primary key, and change any table that was using that primary key as part of a relationship, and any query that joined on it.
Speed of look-up. Key efficiency can be important, as they are used in many WHERE clauses and (more often) in many JOINs. With JOINS in particular, speed of lookup can be vital. The impact will depend on implementation details, and different databases vary according to how they will handle different datatypes (I would have few qualms from a performance perspective in using a large piece of text as a primary key in Postgres where I could specify the use of hash joins, but I'd be very hesitant to do so in SQLServer [Edit: for "large" I'm thinking of perhaps the size of a username, not something the size of the entire Norse Eddas!]).
Frequency of the key being the only interesting data. For example, with a table of languages, and a table of pieces of comments in that language, very often the only reason I would want to join on the language table when dealing with the comments table is either to obtain the language code or to restrict a query to those with a particular language code. Other information about the language is likely to be much more rarely used. In this case while joining on the code is likely to be less efficient than joining on a numeric id set from an IDENTITY column, having the code as the primary key - and hence as what is stored in the foreign key column on the comments table - will remove the need for any JOIN at all, with a considerable efficiency gain. More often though I want more information from the relevant tables than that, so making the JOIN more efficient is more important.
Primary key:
Primary key is nothing but it uniquely identifies each row in a table.
Primary key does not allow duplicate values, nor NULL.
Primary key by default is a clustered index.
A table can have only one primary key.
Unique Key:
Unique key is nothing but it uniquely identifies each row in a table.
Unique key does not allow duplicate values, but it allows (at most one) NULL.
Unique key by default is a non-clustered index.
This is a fruit full link to understand the Primary Key Database Keys.
Keep in mind we have only one clustered index in a table [Talking about SQL Server 2005].
Now if we want to add another unique column then we will use Unique Key column, because
Unique Key column can be added more than one.
A primary key is just any one candidate key. In principle primary keys are not different from any other candidate key because all keys are equal in the relational model.
SQL however has two different syntax for implementing candidate keys: the PRIMARY KEY constraint and the UNIQUE constraint (on non-nullable columns of course). In practice they achieve exactly the same thing except for the essentially useless restriction that a PRIMARY KEY can only be used once per table whereas a UNIQUE constraint can be used multiple times.
So there is no fundamental "use" for the PRIMARY KEY constraint. It is redundant and could easily be ignored or dropped from the language altogether. However, many people find it convenient to single out one particular key per table as having special significance. There is a very widely observed convention that keys designated with PRIMARY KEY are used for foreign key references, although this is entirely optional.
Short version:
From the point of view of database theory, there is none. Both are simply candidate keys.
In practice, most DMBS like to have one "standard key", which can be used for e.g. deciding how to store data, and to tell tools and DB clients which is the best way to identify a record.
So distinguishing one unique key as the "primary key" is just an implementation convenience (but an important one).
I'm currently designing a brand new database. In school, we always learned to put a primary key in each table.
I read a lot of articles/discussions/newsgroups posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK.
What's your point of view?
A Primary Key is really just a candidate key that does not allow for NULL. As such, in SQL terms - it's no different than any other unique key.
However, for our non-theoretical RDBMS's, you should have a Primary Key - I've never heard it argued otherwise. If that Primary Key is a surrogate key, then you should also have unique constraints on the natural key(s).
The important bit to walk away with is that you should have unique constraints on all the candidate (whether natural or surrogate) keys. You should then pick the one that is easiest to reference in a Foreign Key to be your Primary Key*.
You should also have a clustered index*. this could be your Primary Key, or a natural key - but it's not required to be either. You should pick your clustered index based on query usage of the table. When in doubt, the Primary Key is not a bad first choice.
Though it's technically only required to refer to a unique key in a foreign key relationship, it's accepted standard practice to greatly favor the primary key. In fact, I wouldn't be surprised if some RDBMS only allow primary key references.
Edit: It's been pointed out that Oracle's term of "clustered table" and "clustered index" are different than Sql Server. The equivalent of what I'm speaking of in Oracle-ese is an Index Ordered Table and it is recommended for OLTP tables - which, I think, would be the main focus of SO questions. I assume if you're responsible for a large OLAP data warehouse, you should already have your own opinions on database design and optimization.
Can you provide references to these articles?
I see no reason to change the tried and true methods. After all, Primary Keys are a fundamental design feature of relational databases.
Using UNIQUE to serve the same purpose sounds really hackish to me. What is their rationale?
Edit: My attention just got drawn back to this old answer. Perhaps the discussion that you read regarding PK vs. UNIQUE dealt with people making something a PK for the sole purpose of enforcing uniqueness on it. The answer to this is, If it IS a key, then make it key, otherwise make it UNIQUE.
A primary key is just a candidate key (unique constraint) singled out for special treatment (automatic creation of indexes, etc).
I expect that the folks who argue against them see no reason to treat one key differently than another. That's where I stand.
[Edit] Apparently I can't comment even on my own answer without 50 points.
#chris: I don't think there's any harm. "Primary Key" is really just syntactic sugar. I use them all the time, but I certainly don't think they're required. A unique key is required, yes, but not necessarily a Primary Key.
It would be very rare denormalization that would make you want to have a table without a primary key. Primary keys have unique constraints automatically just by their nature as the PK.
A unique constraint would be used when you want to guarantee uniqueness in a column in ADDITION to the primary key.
The rule of always have a PK is a good one.
http://msdn.microsoft.com/en-us/library/ms191166.aspx
You should always have a primary key.
However I suspect your question is just worded bit misleading, and you actually mean to ask if the primary key should always be an automatically generated number (also known as surrogate key), or some unique field which is actual meaningful data (also known as natural key), like SSN for people, ISBN for books and so on.
This question is an age old religious war in the DB field.
My take is that natural keys are preferable if they indeed are unique and never change. However, you should be careful, even something seemingly stable like a persons SSN may change under certain circumstances.
Unless the table is a temporary table to stage the data while you work on it, you always want to put a primary key on the table and here's why:
1 - a unique constraint can allow nulls but a primary key never allows nulls. If you run a query with a join on columns with null values you eliminate those rows from the resulting data set because null is not equal to null. This is how even big companies can make accounting errors and have to restate their profits. Their queries didn't show certain rows that should have been included in the total because there were null values in some of the columns of their unique index. Shoulda used a primary key.
2 - a unique index will automatically be placed on the primary key, so you don't have to create one.
3 - most database engines will automatically put a clustered index on the primary key, making queries faster because the rows are stored contiguously in the data blocks. (This can be altered to place the clustered index on a different index if that would speed up the queries.) If a table doesn't have a clustered index, the rows won't be stored contiguously in the data blocks, making the queries slower because the read/write head has to travel all over the disk to pick up the data.
4 - many front end development environments require a primary key in order to update the table or make deletions.
Primary keys should be used in situations where you will be establishing relationships from this table to other tables that will reference this value. However, depending on the nature of the table and the data that you're thinking of applying the unique constraint to, you may be able to use that particular field as a natural primary key rather than having to establish a surrogate key. Of course, surrogate vs natural keys are a whole other discussion. :)
Unique keys can be used if there will be no relationship established between this table and other tables. For example, a table that contains a list of valid email addresses that will be compared against before inserting a new user record or some such. Or unique keys can be used when you have values in a table that has a primary key but must also be absolutely unique. For example, if you have a users table that has a user name. You wouldn't want to use the user name as the primary key, but it must also be unique in order for it to be used for log in purposes.
We need to make a distinction here between logical constructs and physical constructs, and similarly between theory and practice.
To begin with: from a theoretical perspective, if you don't have a primary key, you don't have a table. It's just that simple. So, your question isn't whether your table should have a primary key (of course it should) but how you label it within your RDBMS.
At the physical level, most RDBMSs implement the Primary Key constraint as a Unique Index. If your chosen RDBMS is one of these, there's probably not much practical difference, between designating a column as a Primary Key and simply putting a unique constraint on the column. However: one of these options captures your intent, and the other doesn't. So, the decision is a no-brainer.
Furthermore, some RDBMSs make additional features available if Primary Keys are properly labelled, such as diagramming, and semi-automated foreign-key-constraint support.
Anyone who tells you to use Unique Constraints instead of Primary Keys as a general rule should provide a pretty damned good reason.
the thing is that a primary key can be one or more columns which uniquely identify a single record of a table, where a Unique Constraint is just a constraint on a field which allows only a single instance of any given data element in a table.
PERSONALLY, I use either GUID or auto-incrementing BIGINTS (Identity Insert for SQL SERVER) for unique keys utilized for cross referencing amongst my tables. Then I'll use other data to allow the user to select specific records.
For example, I'll have a list of employees, and have a GUID attached to every record that I use behind the scenes, but when the user selects an employee, they're selecting them based off of the following fields: LastName + FirstName + EmployeeNumber.
My primary key in this scenario is LastName + FirstName + EmployeeNumber while unique key is the associated GUID.
posts saying that it's better to use unique constraint (aka unique index for some db) instead of PK
i guess that the only point here is the same old discussion "natural vs surrogate keys", because unique indexes and pk´s are the same thing.
translating:
posts saying that it's better to use natural key instead of surrogate key
I usually use both PK and UNIQUE KEY. Because even if you don't denote PK in your schema, one is always generated for you internally. It's true both for SQL Server 2005 and MySQL 5.
But I don't use the PK column in my SQLs. It is for management purposes like DELETEing some erroneous rows, finding out gaps between PK values if it's set to AUTO INCREMENT. And, it makes sense to have a PK as numbers, not a set of columns or char arrays.
I've written a lot on this subject: if you read anything of mine be clear that I was probably referring specifically to Jet a.k.a. MS Access.
In Jet, the tables are physically ordered on the PRIMARY KEY using a non-maintained clustered index (is clustered on compact). If the table has no PK but does have candidate keys defined using UNIQUE constraints on NOT NULL columns then the engine will pick one for the clustered index (if your table has no clustered index then it is called a heap, arguably not a table at all!) How does the engine pick a candidate key? Can it pick one which includes nullable columns? I really don't know. The point is that in Jet the only explicit way of specifying the clustered index to the engine is to use PRIMARY KEY. There are of course other uses for the PK in Jet e.g. it will be used as the key if one is omitted from a FOREIGN KEY declaration in SQL DDL but again why not be explicit.
The trouble with Jet is that most people who create tables are unaware of or unconcerned about clustered indexes. In fact, most users (I wager) put an autoincrement Autonumber column on every table and define the PRIMARY KEY solely on this column while failing to put any unique constraints on the natural key and candidate keys (whether an autoincrement column can actually be regarded as a key without exposing it to end users is another discussion in itself). I won't go into detail about clustered indexes here but suffice to say that IMO a sole autoincrement column is rarely to ideal choice.
Whatever you SQL engine, the choice of PRIMARY KEY is arbitrary and engine specific. Usually the engine will apply special meaning to the PK, therefore you should find out what it is and use it to your advantage. I encourage people to use NOT NULL UNIQUE constraints in the hope they will give greater consideration to all candidate keys, especially when they have chosen to use 'autonumber' columns which (should) have no meaning in the data model. But I'd rather folk choose one well considered key and used PRIMARY KEY rather than putting it on the autoincrement column out of habit.
Should all tables have a PK? I say yes because doing otherwise means at the very least you are missing out on a slight advantage the engine affords the PK and at worst you have no data integrity.
BTW Chris OC makes a good point here about temporal tables, which require sequenced primary keys (lowercase) which cannot be implemented via simple PRIMARY KEY constraints (SQL key words in uppercase).
PRIMARY KEY
1. Null
It doesn’t allow Null values. Because of this we refer PRIMARY KEY =
UNIQUE KEY + Not Null CONSTRAINT.
2. INDEX
By default it adds a clustered index.
3. LIMIT
A table can have only one PRIMARY KEY Column[s].
UNIQUE KEY
1. Null
Allows Null value. But only one Null value.
2. INDEX
By default it adds a UNIQUE non-clustered index.
3. LIMIT
A table can have more than one UNIQUE Key Column[s].
If you plan on using LINQ-to-SQL, your tables will require Primary Keys if you plan on performing updates, and they will require a timestamp column if you plan on working in a disconnected environment (such as passing an object through a WCF service application).
If you like .NET, PK's and FK's are your friends.
I submit that you may need both. Primary keys by nature need to be unique and not nullable. They are often surrogate keys as integers create faster joins than character fileds and especially than multiple field character joins. However, as these are often autogenerated, they do not guarantee uniqueness of the data record excluding the id itself. If your table has a natural key that should be unique, you should have a unique index on it to prevent data entry of duplicates. This is a basic data integrity requirement.
Edited to add: It is also a real problem that real world data often does not have a natural key that truly guarantees uniqueness in a normalized table structure, especially if the database is people centered. Names, even name, address and phone number combined (think father and son in the same medical practice) are not necessarily unique.
I was thinking of this problem my self. If you are using unique, you will hurt the 2. NF. According to this every non-pk-attribute has to be depending on the PK. The pair of attributes in this unique constraint are to be considered as part of the PK.
sorry for replying to this 7 years later but didn't want to start a new discussion.