What is different between the two methods of generating cluster primary keys? - sql-server

I have a Table to make a Clustered Primary Key.
CREATE TABLE dbo.SampleTable
(
C1 INT NOT NULL,
C2 INT NOT NULL )
First Way is making Primary Key index with Clustered index.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY CLUSTERED (C1, C2)
Second Way is CREATE CLUSTERED INDEX after ADD CONSTRAINT PRIMARY KEY NONCLUSTERED about same columns.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY NONCLUSTERED (C1, C2)
CREATE CLUSTERED INDEX IDX_SampleTable2 ON dbo.SampleTable (C1 ,C2) -- Can not create Same Name With above Constraint Name
Is there a difference in performance from the above two methods?
Is there a way do not recommend using it?

Yes, there is a difference. By specifying CLUSTERED, you instruct the database to store the data in a certain way. Basically, it enforces that subsequent indexes are stored on subsequent data blocks on the hard drive.
By creating a clustered primary key as in your first statement, all the data in the table will always have unique values in C1, C2 and the data is always stored in subsequent data blocks.
In the second example, you do NOT enforce this CLUSTERED behaviour through the primary key, but through a separate index. Though the effects are the same now, you might choose to remove (or temporarily disable) the index and then the data would no longer be guaranteed to get stored in a CLUSTERED fashion.
Bottom line: In practice these two statements are the same now, but might make a difference in the future because the CLUSTERED property is not integrated in the PK, but in a separate index.

Creating a Nonclustered Primary Key and then creating a Clustered index on the columns within the Primary key is not a good idea. Effectively you'll create 2 indexes on the columns (C1 and C2 in this case), however, it's very unlikely the nonclustered index will ever be used. This is because the Clustered Index is very likely going to be the first choice for the RDBMS, as the pages will be in the order of the Clustered Index. Also, when using a non-clustered index the data engine will still need to refer to the Clustered Index afterwards, to find out the exact location of the row (in the pages).
If you do want a clustered index on your Primary Key(s) then create the key as a Clustered Primary Key. This is not to say that your Primary Key should always be Clustered, but that is a very different subject.

This depends from your datas:
https://learn.microsoft.com/en-gb/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-2017
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the index
definition. There can be only one clustered index per table, because
the data rows themselves can be stored in only one order.
So the clustered key influence the format of your physical data structure.

Related

Multiple Clustered Indexes on a Single Table?

I thought we could only place one clustered index on one table, and put multiple non-clustered indexes on a table, but using the code below I can easily add more than one clustered index to my table.
CREATE CLUSTERED INDEX TBL_MULTI_LC_HIST ON dbo.TBL_MULTI_LC_HIST (ID,AsOfDate)
Is this completely wrong?
It isn't possible to create multiple clustered indexes for a single table. From the docs (emphasis mine):
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
For example this will fail:
CREATE TABLE Thing
(
Column1 INT NOT NULL,
Column2 INT NOT NULL
)
CREATE CLUSTERED INDEX IX1 ON dbo.Thing(Column1)
CREATE CLUSTERED INDEX IX2 ON dbo.Thing(Column2)
Error:
Cannot create more than one clustered index on table 'dbo.Thing'. Drop the existing clustered index 'IX1' before creating another.
Example: http://www.sqlfiddle.com/#!18/53a63/1
You can however have a single index with multiple columns in it which is perhaps where you are getting confused:
CREATE CLUSTERED INDEX IX3 ON dbo.Thing(Column1, Column2)
You can only have one clustered index. A "Clustered" index IS the row... it contains all the columns. Every other index would just contain a pointer to the clustered row. The key of the clustered index enforces an 'ordering' on the rows by default.
If there is no clustered index, then the rows are basically stored in a heap, with no order or structure.

Should the PK on an identity column (which is surrogate key) be non-clustered?

For a table with PK on an identity column, it will be clustered by default. Could it better be non-clustered? The PK is a surrogate key which may never be used for querying directly, it may be used to join another table.
The reason is other indexes will be created for queries. A query which uses a non-clustered index and returned columns are not covered by the index will use less LIO because there is no extra clustered index seek steps?
create table T (
Id int identity(1,1) primary key, -- clustered or non-clustered?
A ....
B ....
C ....
....)
create index ix_A on T (A)
create index ix_..... -- Many indexes can be created for different queries
select A, B
from T
where A between #a and #a+5 -- This query will have less LIO if the PK is non-clustered (seek)
It's perfectly fine to set your surrogate PK to be non-clustered if there is a better candidate in the table for the clustered index.
Good candidates for a clustered index are columns that you will frequently do either range searches ([ColumnName] BETWEEN This AND That) on, or ORDER BY clauses on.

T-SQL Clustered Foreign Key

The "Create Table" grammar rather clearly does not allow me to specify a clustered foreign key constraint. In other words, this is illegal:
--keyword CLUSTERED must be removed before this will execute...
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY,
ContentDefID int NOT NULL CONSTRAINT FK_Plugin_ContentDef FOREIGN KEY CLUSTERED REFERENCES ContentDef(ID)
)
GO
But I don't understand why it is illegal. ISTM that clustering a foreign-key would facilitate performance of paged-lookups. In other words, "give me child items 80 through 140 of parent ID 20".
Is there a rationale for this?
Update
Based on Oded and Tvanfosson feedback, I've found that the following works:
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY,
ContentDefID int NOT NULL UNIQUE CLUSTERED CONSTRAINT FK_ContentDefContent FOREIGN KEY REFERENCES ContentDef(ID)
)
GO
But the above causes more problems than it solves. First, a "UNIQUE" foreign key forces my relationship to be one-to-one which I don't want. Second, this only works because it represents the creation of two separate constraints, rather than a single CLUSTERED FOREIGN KEY.
But this investigation is getting me closer to my answer. Evidently clustered indexes MUST be unique, as stated here on SO. Quoting:
If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier
In particular, I think this answer covers it.
As others have explained, the clustered index does not have to be the primary key but it either has to be unique or SQL-Server adds a (not shown) UNIQUIFIER column to it.
To avoid this, you can make the clustered index unique by explicitly adding the primary key column to the clustered index, like below. The index will then be avaialbel to be used by the foreign key constraints (and for queries, like joining the two tables).
Notice, that as #Martin Smith has explained, the concepts of CONSTRAINT and INDEX are different. And the various DBMSs implement these in different ways. SQL-Server automatically creates an index for some constraints, while it doesn't for foreign key constraints. It's advised though to have an index that the constraint can use (when deleting or updating in the referenced table):
CREATE TABLE Content(
ID int NOT NULL,
ContentDefID int NOT NULL,
CONSTRAINT PK_Content_ID
PRIMARY KEY NONCLUSTERED (ID),
CONSTRAINT CI_Content
UNIQUE CLUSTERED (ContentDefID, ID),
CONSTRAINT FK_Plugin_ContentDef
FOREIGN KEY (ContentDefID) REFERENCES ContentDef(ID)
) ;
Is there a rationale for this?
You might as well ask why you can't create a CLUSTERED check constraint or a CLUSTERED default constraint.
A foreign key simply defines a logical constraint and has no indexes automatically created for it in SQL Server (this only happens for UNIQUE or PRIMARY KEY constraints). It is always the case in SQL Server that if you want the FK columns indexed you need to run a CREATE INDEX on the relevant column(s) yourself.
Therefore the concept of a CLUSTERED FOREIGN KEY doesn't make any sense. You can of course create a CLUSTERED INDEX on the columns making up the FK though as you indicate in your question.
You can only have one clustered index on a table. By default this will be the primary key column.
There are ways to change this - you will need to use PRIMARY KEY NONCLUSTERED and UNIQUE CLUSTERED FOREIGN KEY.
It seems you're conflating the ideas of the clustered index with keys (either primary or foreign). Why not just make the table and then specify its clustered index afterwards? (code copied from your first example and changed as little as possible)
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY NONCLUSTERED,
ContentDefID int NOT NULL CONSTRAINT FK_Plugin_ContentDef FOREIGN KEY REFERENCES ContentDef(ID)
)
GO
CREATE CLUSTERED INDEX IX_Content_Clustered on Content(ContentDefID)
There's no need for you to make the clustered index unique

Primary keys without defaul index (sort) - SQL2005

How do I switch off the default index on primary keys
I dont want all my tables to be indexed (sorted) but they must have a primary key
You can define a primary key index as NONCLUSTERED to prevent the table rows from being ordered according to the primary key, but you cannot define a primary key without some associated index.
Tables are always unsorted - there is no "default" order for a table and the optimiser may or may not choose to use an index if one exists.
In SQL Server an index is effectively the only way to implement a key. You get a choice between clustered or nonclustered indexes - that is all.
The means by which SQL Server implements Primary and Unique keys is by placing an index on those columns. So you cannot have a Primary Key (or Unique constraint) without an index.
You can tell SQL Server to use a nonclustered index to implement these indexes. If there are only nonclustered indexes on a table (or no indexes at all), you have a heap. It's pretty rare that this is what you actually want.
Just because a table has a clustered index, this in no way indicates that the rows of the table will be returned in the "order" defined by such an index - the fact that the rows are usually returned in that order is an implementation quirk.
And the actual code would be:
CREATE TABLE T (
Column1 char(1) not null,
Column2 char(1) not null,
Column3 char(1) not null,
constraint PK_T PRIMARY KEY NONCLUSTERED (Column2,Column3)
)
What does " I dont want all my tables to be sorted" mean ? If it means that you want the rows to appear in the order where they've been entered, there's only one way to garantee it: have a field that stores that order (or the time if you don't have a lot of transactions). And in that case, you will want to have a clustered index on that field for best performance.
You might end up with a non clustered PK (like the productId) AND a clustered unique index on your autonumber_or_timestamp field for max performance.
But that's really depending on the reality your're trying to model, and your question contains too little information about this. DB design is NOT abstract thinking.

Database clustered Index on referring column

I have two tables like this.
Table A
A_ID ( Primary Key)
A1
A2
Table B
B_ID ( Primary Key)
B1
B2
A_ID ( foreign key but not enforced in database, not unique )
Although by default SQL server creates clustered indexes on B_ID, I have lot of select queries on B, which depend on A_ID
something like this
SELECT * FROM B WHERE B.A_ID = 'SOME ID'
Should I be creating clustered Index on A_ID of TABLE B ?
No a regular non-clustered index should do fine. A clustered index is especially handy when doing range queries (BETWEEN) As a rule of thumb I always create non-clustered indexes on columns used in foreign key constraints.
No, just create a normal non-clustered index - you'll have basically the same results and same improvements in your query speed.
Is the A_ID on table B even guaranteed to be unique?? Couldn't more than 1 entry in "B" reference the same A_ID ??
Best practice for a clustered key is:
as small as possible (narrow)
unique
stable (or static - it should never change)
ever increasing
See Kim Tripp's The Clustered Index Debate continues or Ever-increasing clustering key - the Clustered Index Debate - agin! for additional info.
If your clustered key cannot be guaranteed to be unique, SQL Server will add a 4-byte uniquifier to it - you'll want to avoid that whenever possible (because your clustering key will be added to every single entry of every single non-clustered index on your table, leading to waste of space if it's too wide).
Marc

Resources