What's the difference between a Primary Key and Identity? - sql-server

In a SQL Server db, what is the difference between a Primary Key and an Identity column? A column can be a primary key without being an indentity. A column cannot, however, be an identity without being a primary key.
In addition to the differences, what does a PK and Identity column offer that just a PK column doesn't?

A column can definitely be an identity without being a PK.
An identity is simply an auto-increasing column.
A primary key is the unique column or columns that define the row.
These two are often used together, but there's no requirement that this be so.

This answer is more of WHY identity and primary key than WHAT they are since Joe has answered WHAT correctly above.
An identity is a value your SQL controls. Identity is a row function. It is sequential, either increasing or decreasing in value, at least in SQL Server. It should never be modified and gaps in the value should be ignored. Identity values are very useful in linking table B to table A since the value is never duplicated. The identity is not the best choice for a clustered index in every case. If a table contains audit data the clustered index may be better being created on the date occurred as it will present the answer to the question " what happened between today and four days ago" with less work because the records for the dates are sequential in the data pages.
A primary key makes the column or columns in a row unique. Primary key is a column function. Only one primary key may be defined on any table, but multiple unique indexes may be created which simulates the primary key. Clustering the primary key is not always the correct choice. Consider a phone book. If the phone book is clustered by the primary key(phone number) the query to return the phone numbers on "First Street" will be very costly.
The general rules I follow for identity and primary key are:
Always use an identity column
Create the clustered index on the column or columns which are used in range lookups
Keep the clustered index narrow since the clustered index is added to the end of every other index
Create primary key and unique indexes to reject duplicate values
Narrow keys are better
Create an index for every column or columns used in joins
These are my GENERAL rules.

A primary key (also known as a candidate key) is any set of attributes that have the properties of uniqueness and minimality. That means the key column or columns are constrained to be unique. In other words the DBMS won't permit any two rows to have the same set of values for those attributes.
The IDENTITY property effectively creates an auto-incrementing default value for a column. That column does not have to be unique though, so an IDENTITY column isn't necessarily a key.
However, an IDENTITY column is typically intended to be used as a key and therefore it usually has a uniqueness constraint on it to ensure that duplicates are not permitted.

Major Difference between Primary and Identity Column
Primary Column:
Primary Key cannot have duplicate values.
It creates a clustered index for the Table.
It can be set for any column type.
We need to provide the primary column value while inserting in the table.
Identity Column:
Identity Column can have duplicate value.
It can only be set for Integer related columns like int, bigint, smallint, tinyint or decimal
No need to insert values in the identity column. It is inserted automatically based on the seed.

EDITS MADE BASED ON FEEDBACK
A key is unique to a row. It's a way of identifying a row. Rows may have none, one, or several keys. These keys may consist of one or more columns.
Keys are indexes with a unique constraint. This differentiates them from non-key indexes.
Any index with multi-columns is called a "composite index".
Traditionally, a primary key is viewed as the main key that uniquely identifies a row. There may only be one of these.
Depending on the table's design, one may have no primary key.
A primary key is just that - a "prime key". It's the main one that specifies the unique identity of a row. Depending on a table's design, this can be a misnomer and multiple keys express the uniqueness.
In SQL Server, a primary key may be clustered. This means the remaining columns are attached to this key at the leaf level of the index. In other words, once SQL Server has found the key, it has also found the row (to be clear, this is because of the clustered aspect).
An identity column is simply a method of generating a unique ID for a row.
These two are often used together, but this is not a requirement.

You can use IDENTITY not only with integers, but also with any numeric data type that has a scale of 0
primary key could have scale but its not required.
IDENTITY, combined with a PRIMARY KEY or UNIQUE constraint, lets you provide a simple unique row identifier

Primary key emphasizing on uniqueness and avoid duplication value for all records on the same column, while identity provides increasing numbers in a column without inserting data.
Both features could be on a single column or on difference one.

Related

Primary keys, index and constrains in SQL

Please correct if im wrong. And kindly point me to articles on this concept.
When we create a primary key, in the background there is automatically a unique index, clustered index, and a not null constraint created on that coloumn.
Does this also mean that if we create a not null constraint, [clustered index or non clustered index] and unique index on a column, then that column becomes a primary key?
I want to understand the core concept/relation between primary key, index and constrains.
The primary key is the one that is declared as the "primary" key. Just having the characteristics doesn't make a key "primary". It has to be explicitly declared as such.
Different databases implement primary keys in different ways. Although primary keys are usually implemented with a clustered unique index, that is not a requirement.
The primary key is exactly what its name suggests: "primary". Any other column or group of columns can be declared both unique and not null. That does not make them primary keys. In some databases, you could even define another column or group of columns as not null, unique and clustered -- without that being the primary key.
In summary:
You can have any number of unique indexes on a table.
You can have any number of unique indexes on non-NULL columns on a table.
You can have at most one clustered index. In almost all cases, this would be the primary key. But is not required in all databases.
You can have at most one primary key. In almost all cases, this would be clustered, although that is not required in all databases.
For more detail, you should refer to the documentation of the database you are using.
If you have multiple columns comprising non-NULL, unique keys, then only one is "primary" -- that one that has been explicitly declared as primary.
Why would you have a non-clustered primary key? I can give one scenario. Imagine a database where UUIDs are the keys for rows. The company does not want to use auto-generated sequence numbers, because they provide information in the number.
However, UUIDs are remarkably bad candidates for clustered indexes, because inserts are almost never at the end. In this case, you might want to design the table with a clustered auto-generated sequential key, to speed inserts You might make this key the primary key. But, you want all foreign key references to use the UUID -- and you want all foreign key references to be to the primary key of the table.
No.
All the columns could be added with Not null and Non-clustered index and Unique But only ONE column could be PK.
And the Unique allows NULL while Primary Key does not.
You might be talking about Candidate Key, here is the ref:
https://www.techopedia.com/definition/21/candidate-key

Database design: primary keys and unique constraints

I am building my third database and I am unsure how to structure primary keys and unique constraints to ensure data integrity.
The database is updated periodically when public data is released, some monthly, some annually and will have a low query load.
I have some tables where a combination of columns makes each row unique. I am aware that I can use a composite primary key but I am unsure if this is best practise as many of the articles I read on this have conflicting views.
For example, one table requires five columns for uniqueness with datatypes:
smallint
char(7)
varchar(7)
smallint
varchar(3)
I believe this would make the primary key up to 25 bytes and all of these five columns will regular be selected and often used in the where clause. One of the smallints is also a foreign key but no foreign keys will reference this table.
The only alternative I am aware of is to create a unique identifier for each row, setting this as the primary key and creating a unique constraint over the five columns.
What is the best option and if it is to use a composite primary key should it be clustered or non-clustered?
Is there a recommended maximum number of columns and bytes for a composite primary key?
I don't know what is the purpose of your database, but first advise if you can add ,id column as primary key. It'll be easier for every query.
Then If you want just be sur of unique combination of columns in the same table, the easy way is a index with is the keyword UNIQUE:
CREATE UNIQUE NONCLUSTERED INDEX [IX_COLUMN1_COLUMN2]
ON TABLE(COLUMN1 ASC, COLUMN2ASC);

Non nullable column with clustered unique index. Why need Primary Key?

In SQL Server, I have a non nullable column with a unique clustered index on it.
If I make this column a Primary Key the exact same index is created automatically plus
the column is recognized as a Primary Key.
I understand the abstract/semantic difference.
(Primary Key identifies the entity, while any other column with this index may not.
For example, a Person can have Email field which is Unique,Non-nullable... but can be changed)
But what bothers me is the actual difference when it comes to the DB engine itself.
What will happen if I will just create an Id column, make it non-nullable, create a unique clustered index for it, make it Identity Increment, but without the Primary Key constraint?
In what scenarios the Primary Key constraint comes into play?
(I've looked at many related questions before asking this, but all the answers I saw ended up with an abstract/theoretical explanation).
Nothing will be different really. You specify PRIMARY KEY to relay your intentions, not so that the engine does anything differently. When constructing a query plan, the optimizer will still use the uniqueness for all of its properties, and will still use the clustered index for all of its properties, regardless of whether you technically created it as a PRIMARY KEY. When creating a FOREIGN KEY, you can still reference the column(s) specified as unique (clustered or not). The difference is solely in the metadata (sys.indexes.is_primary_key) and in SSMS' representation to you (oh and the fact that you can create a unique clustered index on a NULLable column, but you can't create a PRIMARY KEY on that column).
In fact there are many cases where you want to completely separate the clustered index from the PRIMARY KEY. If you have a table where the PK is a GUID, for example, and you are typically running date range queries against the table, you are probably better off having the PK be non-clustered and have a clustered index on a naturally increasing column (the datetime column) - both to minimize page splits on heavy insert activity and also to best assist date range queries. The non-clustered index will be perfectly fine for looking up individual GUIDs. (I wanted to mention that because a lot of people think the primary key has to be clustered. Not true.)
Also interesting to note that if you create a PRIMARY KEY constraint, then create a unique clustered index with the same name using DROP_EXISTING, the is_primary_key column will still be 1 and Object Explorer will still show the index name under Keys.
Here is one scenario - a lot of code to data mapping frameworks look at the database metadata (what are the primary keys, foreign keys, etc) to determine how code is executed. For example Hibernate requires a primary key.
A typical scenario might be generating a where clause for an update.

Clustered Indexes without Primary key

A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node.
Question - in case the primary key does not exists and I set the Name column as clustered index. In this case, will the above statement becomes contradictory?
No - why?
The clustered index will still store the actual data pages at its leaf level, (initially) physically sorted by the name column.
The index navigation structure above the leaf level will contain the name column values for all rows.
So overall: nothing changes.
The primary key is a logical construct, designed to uniquely identify each row in your table. That's why it has to be unique and non-null.
The clustering index is a physical construct that will (initially) phyiscally sort your data by the clustering key and arrange the SQL Server pages accordingly.
While in SQL Server, the primary is used by default as the clustering key, the two do not have to fall together - nor does one have to exist with the other. You can have a table with a non-clustered primary key, or a clustered table without primary key. Both is possible. Whether it's sensible to have that is another discussion - but it's technically possible.
Update: if your primary key is your clustering key, uniqueness is guaranteed (since the primary key must be unique). If you're choosing some column that is not the primary key as your clustering key, and that column does not guarantee uniqueness, SQL Server will - behind the scenes - add a 4-byte (INT) uniqueifier column to those duplicates values to make them unique. So you might have Smith, Smith1, Smith2 and so forth in your clustered index navigation structure for your Smith's.
See:
MSDN: Clustering Index Design Guidelines
Simple-Talk: Effective Clustered Indexes
If the clustered index is not unique, SQL Server creates a 4-byte uniqueifier and adds it to the clustered index value. The uniqueifier is added only if the clustered index value is duplicate, not for all clustered index values.
All nonclustered indexes will contain this value in its leaf level, and non-unique nonclustered index will also have this uniqueifier value in its non-leaf level entry, as a part of bookmark.
Difference between a Primary key and a unique index (or constraint) is that Null values are not allowed in a the primary key column. There is no need to have a primary key on a table but it make things easier for external application to edit the rows in the table and even then, it's not really a necessity with most external applications.
In term of performance, this change nothing. The important is the presence or absence of indexes (either unique or not, clustered or not and with null values or not) and the primary key is essentially simply one more unique index without null value.
For the clustered index, the column doesn't need to be unique and/or without null. A column with duplicates and null values is fine for creating a clustered index.
For a foreign key, it must reference a column with a unique index on it but not necessarily a primary key or without null value. It's perfectly legal to reference a column that is not a primary key and is allowing null value a long as there is a unique index on it. Notice that because there must be an unique index on it, this column cannot have more than a single null value.
There is no limitation on the foreign key column itself (the column on the foreign table) but performance wise, setting an index on it is often a good thing.

What is the difference between a primary key and a index key

Can anyone tell me what is the difference between a primary key and index key. And when to use which?
A primary key is a special kind of index in that:
there can be only one;
it cannot be nullable; and
it must be unique.
You tend to use the primary key as the most natural unique identifier for a row (such as social security number, employee ID and so forth, although there is a school of thought that you should always use an artificial surrogate key for this).
Indexes, on the other hand, can be used for fast retrieval based on other columns. For example, an employee database may have your employee number as the primary key but it may also have an index on your last name or your department.
Both of these indexes (last name and department) would disallow NULLs (probably) and allow duplicates (almost certainly), and they would be useful to speed up queries looking for anyone with (for example) the last name 'Corleone' or working in the 'HitMan' department.
A key (minimal superkey) is a set of attributes, the values of which are unique for every tuple (every row in the table at some point in time).
An index is a performance optimisation feature that enables data to be accessed faster.
Keys are frequently good candidates for indexing and some DBMSs automatically create indexes for keys, but that doesn't have to be so.
The phrase "index key" mixes these two quite different words and might be best avoided if you want to avoid any confusion. "Index key" is sometimes used to mean "the set of attributes in an index". However the set of attributes in question are not necessarily a key because they may not be unique.
Oracle Database enforces a UNIQUE key or PRIMARY KEY integrity constraint on a table by creating a unique index on the unique key or primary key. This index is automatically created by the database when the constraint is enabled.
You can create indexes explicitly (outside of integrity constraints) using the SQL statement CREATE INDEX .
Indexes can be unique or non-unique. Unique indexes guarantee that no two rows of a table have duplicate values in the key column (or columns). Non-unique indexes do not impose this restriction on the column values.
Use the CREATE UNIQUE INDEX statement to create a unique index.
Specifying the Index Associated with a Constraint
If you require more explicit control over the indexes associated with UNIQUE and PRIMARY KEY constraints, the database lets you:
1. Specify an existing index that the database is to use
to enforce the constraint
2. Specify a CREATE INDEX statement that the database is to use to create
the index and enforce the constraint
These options are specified using the USING INDEX clause.
Example:
CREATE TABLE a (
a1 INT PRIMARY KEY USING INDEX (create index ai on a (a1)));
http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
Other responses are defining the Primary Key, but not the Primary Index.
A Primary Index isn't an index on the Primary Key.
A Primary Index is your table's data structure, but only if your data structure is ordered by the Primary Key, thus allowing efficient lookups without a requiring a separate data structure to look up records by the Primary Key.
All databases (that I'm aware of) have a Primary Key.
Not all databases have a Primary Index. Most of those that don't build a secondary index on the Primary Key by default.

Resources