A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node.
Question - in case the primary key does not exists and I set the Name column as clustered index. In this case, will the above statement becomes contradictory?
No - why?
The clustered index will still store the actual data pages at its leaf level, (initially) physically sorted by the name column.
The index navigation structure above the leaf level will contain the name column values for all rows.
So overall: nothing changes.
The primary key is a logical construct, designed to uniquely identify each row in your table. That's why it has to be unique and non-null.
The clustering index is a physical construct that will (initially) phyiscally sort your data by the clustering key and arrange the SQL Server pages accordingly.
While in SQL Server, the primary is used by default as the clustering key, the two do not have to fall together - nor does one have to exist with the other. You can have a table with a non-clustered primary key, or a clustered table without primary key. Both is possible. Whether it's sensible to have that is another discussion - but it's technically possible.
Update: if your primary key is your clustering key, uniqueness is guaranteed (since the primary key must be unique). If you're choosing some column that is not the primary key as your clustering key, and that column does not guarantee uniqueness, SQL Server will - behind the scenes - add a 4-byte (INT) uniqueifier column to those duplicates values to make them unique. So you might have Smith, Smith1, Smith2 and so forth in your clustered index navigation structure for your Smith's.
See:
MSDN: Clustering Index Design Guidelines
Simple-Talk: Effective Clustered Indexes
If the clustered index is not unique, SQL Server creates a 4-byte uniqueifier and adds it to the clustered index value. The uniqueifier is added only if the clustered index value is duplicate, not for all clustered index values.
All nonclustered indexes will contain this value in its leaf level, and non-unique nonclustered index will also have this uniqueifier value in its non-leaf level entry, as a part of bookmark.
Difference between a Primary key and a unique index (or constraint) is that Null values are not allowed in a the primary key column. There is no need to have a primary key on a table but it make things easier for external application to edit the rows in the table and even then, it's not really a necessity with most external applications.
In term of performance, this change nothing. The important is the presence or absence of indexes (either unique or not, clustered or not and with null values or not) and the primary key is essentially simply one more unique index without null value.
For the clustered index, the column doesn't need to be unique and/or without null. A column with duplicates and null values is fine for creating a clustered index.
For a foreign key, it must reference a column with a unique index on it but not necessarily a primary key or without null value. It's perfectly legal to reference a column that is not a primary key and is allowing null value a long as there is a unique index on it. Notice that because there must be an unique index on it, this column cannot have more than a single null value.
There is no limitation on the foreign key column itself (the column on the foreign table) but performance wise, setting an index on it is often a good thing.
Related
Please correct if im wrong. And kindly point me to articles on this concept.
When we create a primary key, in the background there is automatically a unique index, clustered index, and a not null constraint created on that coloumn.
Does this also mean that if we create a not null constraint, [clustered index or non clustered index] and unique index on a column, then that column becomes a primary key?
I want to understand the core concept/relation between primary key, index and constrains.
The primary key is the one that is declared as the "primary" key. Just having the characteristics doesn't make a key "primary". It has to be explicitly declared as such.
Different databases implement primary keys in different ways. Although primary keys are usually implemented with a clustered unique index, that is not a requirement.
The primary key is exactly what its name suggests: "primary". Any other column or group of columns can be declared both unique and not null. That does not make them primary keys. In some databases, you could even define another column or group of columns as not null, unique and clustered -- without that being the primary key.
In summary:
You can have any number of unique indexes on a table.
You can have any number of unique indexes on non-NULL columns on a table.
You can have at most one clustered index. In almost all cases, this would be the primary key. But is not required in all databases.
You can have at most one primary key. In almost all cases, this would be clustered, although that is not required in all databases.
For more detail, you should refer to the documentation of the database you are using.
If you have multiple columns comprising non-NULL, unique keys, then only one is "primary" -- that one that has been explicitly declared as primary.
Why would you have a non-clustered primary key? I can give one scenario. Imagine a database where UUIDs are the keys for rows. The company does not want to use auto-generated sequence numbers, because they provide information in the number.
However, UUIDs are remarkably bad candidates for clustered indexes, because inserts are almost never at the end. In this case, you might want to design the table with a clustered auto-generated sequential key, to speed inserts You might make this key the primary key. But, you want all foreign key references to use the UUID -- and you want all foreign key references to be to the primary key of the table.
No.
All the columns could be added with Not null and Non-clustered index and Unique But only ONE column could be PK.
And the Unique allows NULL while Primary Key does not.
You might be talking about Candidate Key, here is the ref:
https://www.techopedia.com/definition/21/candidate-key
I was seaching how to move a table from one filegroup to the other, and I had some doubts as to why most of the replies I found dealt with clustered indexes, considering that my question had to do with tables.
Then I looked at How I can move table to another filegroup?, and it says that the clustered index is the table data, which explains the reasoning behind recreating a clustered index with CREATE CLUSTERED INDEX.
But in that same question it says that if my clustered index is unique, then do something else.
My question: I assume that when I create tables on a database, a clustered index is created for that table. So how can it not be unique?
Thanks.
If you have an int array and you store the number 1 twice in it - how can that array not be unique?! (Trick question to get you thinking. It clearly can be not unique.) Being unique is a constraint on the data. Fundamentally, there is nothing preventing you from creating multiple rows that have the same values in all columns.
In a heap this is not a problem physically at all. The internal row identifier is it's location on disk.
In a b-tree based index (a "clustered index") the physical data structure indeed requires uniqueness. Note, that the logical structure (the table) does not. This is a physical concern. It's an implementation detail. SQL Server does this by internally appending a key column that contains a sequence number that is counted upwards. This disambiguates the records. You can observe this effect by creating more than 2^32 rows with the same non-unique key. You will receive an error.
So there's a hidden column in the table that you cannot access. It's officially called "uniqueifier". Internally, it's used to complete the CI key to make it unique. It's stored and used everywhere where normally the unique CI key would be used: In the CI, in non-unique NCIs, in the lock hash and in query plans.
If Clustered Index is not unique then SQL Server internally creates Uniquifier to make uniqueness on that record. I will try to explain with an example:
CREATE TABLE Test2 (Col1 INT, Col2 INT)
CREATE CLUSTERED INDEX idxClustered ON Test2 (Col1)
CREATE NONCLUSTERED INDEX idxNonClustered ON test2 (Col2)
Here cluserered index is not unique
INSERT INTO Test2 VALUES (1,1), (2,2)
INSERT INTO Test2 VALUES (3,3)
INSERT INTO Test2 VALUES (3,3)
--Get the Page Number of the Non Clustered Index
DBCC IND (Test, Test2, -1)
--Examine the Results of the Page
--Not to run in production
DBCC TRACEON (3604);
DBCC PAGE(Test, 1, 3376, 3);
You will see Uniquifier key with corresponding uniqueness value... If your clustered index is Unique Clustered Index then It will not have that Uniquifier attribute.
**usr* has a good post worth reading. I will add here from Microsofts Documentation.
First, you are not alone with Clustered-Indexes. Honestly, the name itself is somewhat confusing (Structured-Indexes or Disk-Indexes would probably be better in SQL).
Refer back to the official documentation from MSDN. Any alterations by me are in italics:
A Clustered Index is an on-disk structure of the table. This means the values are pointing to a physical location. This is why when you move the table you need to recreate the Index because the physical location has been altered.
Clustered
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the index
definition. There can be only one clustered index per table, because
the data rows themselves can be sorted in only one order.
The only time the data rows in a table are stored in sorted order is
when the table contains a clustered index. When a table has a
clustered index, the table is called a clustered table. If a table has
no clustered index, its data rows are stored in an unordered structure
called a heap.
Nonclustered
Nonclustered indexes have a structure separate from the data rows (like pointers, this is a logical ordering of the data that consumes a fraction of the physical disk space).
A nonclustered index contains the nonclustered index key values and each
key value entry has a pointer to the data row that contains the key
value.
The pointer from an index row in a nonclustered index to a data row
is called a row locator. The structure of the row locator depends on
whether the data pages are stored in a heap or a clustered table (think ordered).
For a heap, a row locator is a pointer to the row.
For a clustered table, the row locator is the clustered index key.
ABSTRACT VIEW:
A table created is not necessarily a clustered (ordered) table.
An index does not necessarily have to be unique. It is an abstract view of the table.
Unique means that a value or set of values will not repeat themselves. If you wish to enforce this, you can add a constraint by the index (i.e. UNIQUE CLUSTERED INDEX) or a CONSTRAINT such as PRIMARY KEY if you wish this to be managed in the table structure itself.
You may have multiple unique indexes since as long as the values are represented logically, they will not share the same value as another row pointer.
Consider you have Columns A, B, and C in a given table.
Column A was created with a UNIQUE CLUSTERED INDEX. This means that either A already had an enforceable UNIQUE constraint (like PK, UNIQUE CONSTRAINT) or was DECLARED EXPLICITLY.
A Column Group {B,C} could be a unique index so long as B and C never repeat itself together. In the same way, you could theoretically have indexes with the groups {A}, {B,C}, {A,C}, and every one of them be unique. Recall that an index is a logical ordering of the data so they likely will not have the same logical value (and thus are unique).
HOWEVER: unless the datatype, constraint (including the INDEX constraint), or table structure enforces a unique constraint on a COLUMN, you should not assume the index is unique. Furthermore, you cannot create a UNIQUE index if there are more than one rows containing the same combination of NULL values since SQL Server will treat them as the same value (NULL being unknown).
Will SQL Server use your indexes, unique or not? Well that is another story and depends on a number of things. But hopefully you find this post helpful.
Sources:
MSDN - Clustered and Nonclustered Indexes Described
A clustered index doesn't have to be unique. But, there can be only one clustered index on a table, because a clustered index actually determines the physical order of the table rows on disk (but I find it confusing to say that the clustered index is the table data, per se, even though they are strongly tied to each other).
HERE is a good post all about non-unique clustered indexes. Even if the index was the entire row of data, you can certainly have duplicate rows (no PK), which would equate to duplicate clustered index nodes.
In SQL Server, I have a non nullable column with a unique clustered index on it.
If I make this column a Primary Key the exact same index is created automatically plus
the column is recognized as a Primary Key.
I understand the abstract/semantic difference.
(Primary Key identifies the entity, while any other column with this index may not.
For example, a Person can have Email field which is Unique,Non-nullable... but can be changed)
But what bothers me is the actual difference when it comes to the DB engine itself.
What will happen if I will just create an Id column, make it non-nullable, create a unique clustered index for it, make it Identity Increment, but without the Primary Key constraint?
In what scenarios the Primary Key constraint comes into play?
(I've looked at many related questions before asking this, but all the answers I saw ended up with an abstract/theoretical explanation).
Nothing will be different really. You specify PRIMARY KEY to relay your intentions, not so that the engine does anything differently. When constructing a query plan, the optimizer will still use the uniqueness for all of its properties, and will still use the clustered index for all of its properties, regardless of whether you technically created it as a PRIMARY KEY. When creating a FOREIGN KEY, you can still reference the column(s) specified as unique (clustered or not). The difference is solely in the metadata (sys.indexes.is_primary_key) and in SSMS' representation to you (oh and the fact that you can create a unique clustered index on a NULLable column, but you can't create a PRIMARY KEY on that column).
In fact there are many cases where you want to completely separate the clustered index from the PRIMARY KEY. If you have a table where the PK is a GUID, for example, and you are typically running date range queries against the table, you are probably better off having the PK be non-clustered and have a clustered index on a naturally increasing column (the datetime column) - both to minimize page splits on heavy insert activity and also to best assist date range queries. The non-clustered index will be perfectly fine for looking up individual GUIDs. (I wanted to mention that because a lot of people think the primary key has to be clustered. Not true.)
Also interesting to note that if you create a PRIMARY KEY constraint, then create a unique clustered index with the same name using DROP_EXISTING, the is_primary_key column will still be 1 and Object Explorer will still show the index name under Keys.
Here is one scenario - a lot of code to data mapping frameworks look at the database metadata (what are the primary keys, foreign keys, etc) to determine how code is executed. For example Hibernate requires a primary key.
A typical scenario might be generating a where clause for an update.
I have the following table that serves to join 3 tables:
ClientID int
BlogID int
MentionID int
Assuming that queries will always come via ClientID, I can create 1 multi-column index (ClientID, BlogID, MentionID).
The question is, should I create it as a clustered index or a unique key? I understand a clustered index stores the data on its leaf nodes. Of course, in this case, the index is the data, so I don't know if SQL Server will duplicate the data or not. Be that as it may, I can't find anything on MSDN about the significance of using "unique key".
How does this differ from Type = Index & IsUnique = yes?
Can someone tell me the advantages each way?
Clustered index is "the table itself", that is, index nodes are arranged in a tree, and its leaf nodes contains row data. Clustered index doesn't have to be declared as unique (though it usually is); if it is not unique, the server implicitly adds a "uniqalizer" to this index, so that each row is uniquely identified.
Other indexes store clustered index value as their leaf nodes (and possibly some other columns if they are included with INCLUDE clause in CREATE INDEX staetment).
Any index might be decalred as unique, so the server would perform an additional check to prevent duplicate values forom getting into the table.
It seems you are asking for the difference among:
MYTABLE
id integer primary key autoincrement
clientid integer
blogid integer
mentionid integer
-- with a unique composite index on (clientid, blogid, mentionid) and three foreign key constraints
and
MYTABLE
clientid
blogid
mentionid
-- with a composite primary key on (clientid, blogid, mentionid) and three foreign key constraints
and
MYTABLE
id integer primary key autoincrement
clientid integer
blogid integer
mentionid integer
with an index on clientid and also an index on blogid and the three foreign key constraints
In the first, you have the index on the integer primary key and also the alternative unique index on the triad. If the second, you have only the unique index on the triadic primary key. In the third, you have a unique index on the integer primary key and two other non-unique indexes, one on clientid and the other on blogid.
The performance gain with the second option's marginally greater efficiency would be de minimis, and so I'd base the decision on other factors. The third is the most flexible in terms of queries and offers greater simplicity of coding; it offers the benefit of indexes on client and blog both, in case you wanted to have a query with blog, not client, in the WHERE clause. As for coding, some GUI tools and middleware have trouble with multi-part primary keys, and your update/insert/delete logic will be simpler when it has to deal with a single integer PK column. I have found that code simplicity and ease of maintenance are far better things than a few seconds or only a few fractions of seconds of improvement in query response time.
A unique index, a unique key and
a unique constraint are basically
the same thing. They result in an
index that enforces uniqueness.
Clustered means that the index
becomes the table itself. It's good
to have a clustered index, otherwise
the table hangs around in an
unordered heap.
Unique and clustered are unrelated properties. You can combine them in any way you like. So in your case, I'd create a unique clustered index. The normal way to do that is by creating the index as a clustered primary key.
The data will not be duplicated if you create a clustered unique index on your three columns.
The unique clustered index will be the data - and the index at the same time :-)
Since this is a three-way join table, this clustered index probably does make a lot of sense. I'd say: go for it!
UNIQUE INDEX and UNIQUE CONSTRAINT are somewhat different concepts.
UNIQUE CONSTRAINT is a logical concept and means "make sure this column is unique, no matter how"
UNIQUE INDEX is a physical concept and means "create a B-Tree index on this column and fail whenever duplicates are inserted there"
The latter implies the former but not vice versa.
For instance, in Oracle, if you have a non-unique index on col1:
CREATE UNIQUE INDEX (col1) will fail and say "these columns are already indexed"
ALTER TABLE ADD CONSTRAINT UNIQUE(col1) will succeed and use the existing index to police the constraint.
Use CONSTRAINT if you just want the column to be unique and INDEX if you know a B-Tree index is what you want (to speed up searches etc).
How does the PRIMARY KEY keyword relate to clustered indexes in SQL Server?
(Some people seem to want to answer this question instead of a different question I asked, so I am giving them a better place to do so.)
How does the PRIMARY KEY keyword related to clustered indexes in MS SqlServer?
By default, a PRIMARY KEY is implemented as a clustered index. However, you can back it by an unclustered index as well (specifying NONCLUSTERED options to its declaration)
A clustered index is not necessarily a PRIMARY KEY. It can even be non-unique (in this case, a hidden column called uniqueifier is added to each key).
Note that a clustered index is not really an index (i. e. a projection of a table ordered differently, with the references to original records). It is the table itself, with the original records ordered.
When you create a clustered index, you don't really "create" anything that you can drop apart from the table. You just rearrange the table itself and change the way the records are stored.
The clustered index of a table is normally defined on the primary key columns.
This, however is not a strict requirement.
From MSDN:
When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index.
And:
You can create a clustered index on a column other than primary key column if a nonclustered primary key constraint was specified.
A primary key is, as the name implies, the primary unique identifier for a row in your table. A clustered index physically orders the data according to the index. Although SQL Server will cluster a primary key by default, there is no direct relationship between the two.