T-SQL Clustered Foreign Key - sql-server

The "Create Table" grammar rather clearly does not allow me to specify a clustered foreign key constraint. In other words, this is illegal:
--keyword CLUSTERED must be removed before this will execute...
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY,
ContentDefID int NOT NULL CONSTRAINT FK_Plugin_ContentDef FOREIGN KEY CLUSTERED REFERENCES ContentDef(ID)
)
GO
But I don't understand why it is illegal. ISTM that clustering a foreign-key would facilitate performance of paged-lookups. In other words, "give me child items 80 through 140 of parent ID 20".
Is there a rationale for this?
Update
Based on Oded and Tvanfosson feedback, I've found that the following works:
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY,
ContentDefID int NOT NULL UNIQUE CLUSTERED CONSTRAINT FK_ContentDefContent FOREIGN KEY REFERENCES ContentDef(ID)
)
GO
But the above causes more problems than it solves. First, a "UNIQUE" foreign key forces my relationship to be one-to-one which I don't want. Second, this only works because it represents the creation of two separate constraints, rather than a single CLUSTERED FOREIGN KEY.
But this investigation is getting me closer to my answer. Evidently clustered indexes MUST be unique, as stated here on SO. Quoting:
If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier
In particular, I think this answer covers it.

As others have explained, the clustered index does not have to be the primary key but it either has to be unique or SQL-Server adds a (not shown) UNIQUIFIER column to it.
To avoid this, you can make the clustered index unique by explicitly adding the primary key column to the clustered index, like below. The index will then be avaialbel to be used by the foreign key constraints (and for queries, like joining the two tables).
Notice, that as #Martin Smith has explained, the concepts of CONSTRAINT and INDEX are different. And the various DBMSs implement these in different ways. SQL-Server automatically creates an index for some constraints, while it doesn't for foreign key constraints. It's advised though to have an index that the constraint can use (when deleting or updating in the referenced table):
CREATE TABLE Content(
ID int NOT NULL,
ContentDefID int NOT NULL,
CONSTRAINT PK_Content_ID
PRIMARY KEY NONCLUSTERED (ID),
CONSTRAINT CI_Content
UNIQUE CLUSTERED (ContentDefID, ID),
CONSTRAINT FK_Plugin_ContentDef
FOREIGN KEY (ContentDefID) REFERENCES ContentDef(ID)
) ;

Is there a rationale for this?
You might as well ask why you can't create a CLUSTERED check constraint or a CLUSTERED default constraint.
A foreign key simply defines a logical constraint and has no indexes automatically created for it in SQL Server (this only happens for UNIQUE or PRIMARY KEY constraints). It is always the case in SQL Server that if you want the FK columns indexed you need to run a CREATE INDEX on the relevant column(s) yourself.
Therefore the concept of a CLUSTERED FOREIGN KEY doesn't make any sense. You can of course create a CLUSTERED INDEX on the columns making up the FK though as you indicate in your question.

You can only have one clustered index on a table. By default this will be the primary key column.
There are ways to change this - you will need to use PRIMARY KEY NONCLUSTERED and UNIQUE CLUSTERED FOREIGN KEY.

It seems you're conflating the ideas of the clustered index with keys (either primary or foreign). Why not just make the table and then specify its clustered index afterwards? (code copied from your first example and changed as little as possible)
CREATE TABLE [Content](
[ID] [int] NOT NULL CONSTRAINT PK_Content_ID PRIMARY KEY NONCLUSTERED,
ContentDefID int NOT NULL CONSTRAINT FK_Plugin_ContentDef FOREIGN KEY REFERENCES ContentDef(ID)
)
GO
CREATE CLUSTERED INDEX IX_Content_Clustered on Content(ContentDefID)
There's no need for you to make the clustered index unique

Related

What is different between the two methods of generating cluster primary keys?

I have a Table to make a Clustered Primary Key.
CREATE TABLE dbo.SampleTable
(
C1 INT NOT NULL,
C2 INT NOT NULL )
First Way is making Primary Key index with Clustered index.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY CLUSTERED (C1, C2)
Second Way is CREATE CLUSTERED INDEX after ADD CONSTRAINT PRIMARY KEY NONCLUSTERED about same columns.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY NONCLUSTERED (C1, C2)
CREATE CLUSTERED INDEX IDX_SampleTable2 ON dbo.SampleTable (C1 ,C2) -- Can not create Same Name With above Constraint Name
Is there a difference in performance from the above two methods?
Is there a way do not recommend using it?
Yes, there is a difference. By specifying CLUSTERED, you instruct the database to store the data in a certain way. Basically, it enforces that subsequent indexes are stored on subsequent data blocks on the hard drive.
By creating a clustered primary key as in your first statement, all the data in the table will always have unique values in C1, C2 and the data is always stored in subsequent data blocks.
In the second example, you do NOT enforce this CLUSTERED behaviour through the primary key, but through a separate index. Though the effects are the same now, you might choose to remove (or temporarily disable) the index and then the data would no longer be guaranteed to get stored in a CLUSTERED fashion.
Bottom line: In practice these two statements are the same now, but might make a difference in the future because the CLUSTERED property is not integrated in the PK, but in a separate index.
Creating a Nonclustered Primary Key and then creating a Clustered index on the columns within the Primary key is not a good idea. Effectively you'll create 2 indexes on the columns (C1 and C2 in this case), however, it's very unlikely the nonclustered index will ever be used. This is because the Clustered Index is very likely going to be the first choice for the RDBMS, as the pages will be in the order of the Clustered Index. Also, when using a non-clustered index the data engine will still need to refer to the Clustered Index afterwards, to find out the exact location of the row (in the pages).
If you do want a clustered index on your Primary Key(s) then create the key as a Clustered Primary Key. This is not to say that your Primary Key should always be Clustered, but that is a very different subject.
This depends from your datas:
https://learn.microsoft.com/en-gb/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-2017
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the index
definition. There can be only one clustered index per table, because
the data rows themselves can be stored in only one order.
So the clustered key influence the format of your physical data structure.

How does the indexing of GUID keys and int keys in SQL Server work? [duplicate]

This question already has answers here:
What are the best practices for using a GUID as a primary key, specifically regarding performance? [closed]
(9 answers)
Closed 4 years ago.
Is that I have been creating a table in SQL that implements GUID, for it I review this question:
What are the best practices for using a GUID as a primary key, specifically regarding performance?
Here an answer tells me to use a GUID key and also another key, but with Identity, so that:
CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
MyINT INT IDENTITY (1,1) NOT NULL,
.... add more columns as needed ......)
ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)
CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable (MyINT)
My question is: why use an int key when it says that we are already using the GUID?
The confusion stems from the multiple meanings of the word "key" in the answer; to wit, the answer states:
if you want to have your PKGUID column as your primary key (but not
your clustering key), and another column MYINT (INT IDENTITY) as your
clustering key...
I would change it to say (changes bolded):
if you want to have your PKGUID column as your primary key (but not
your clustered index), and another column MYINT (INT IDENTITY) as your
clustered index...
The point is that by default, a PRIMARY KEY is buit on a clustered index (unless you specify otherwise); clustered indexes are then included in every other index, which, in the case of a GUID as a clustered PK, can be a significant performance bottleneck. The code you posted is a compromise; it satisfies the "need" to have a GUID for a Primary Key, while clustering on a smaller column value (which can lead to a performance boost).
It's not ideal, but it can be a very useful method. If you'd like to read up more on the differences between keys and indexes, here are some useful links:
What is the difference between a primary key and a index key
https://itknowledgeexchange.techtarget.com/sql-server/difference-between-an-index-and-a-primary-key/
When should I use primary key or index?

What is the impact of creating a table with a unique index but no primary key?

What is the best way to make a simple many-to-many cross reference table which contains nothing but two columns which are themselves primary keys in other tables?
Does anyone have concrete evidence for or against creating a table with a single unique index, but no primary key? (Alternatives are detailed below).
Put another way: How does SQL Server internally uniquely identifies rows a) that have a primary key and b) that do not have a primary key?
In detail:
Given the input tables:
CREATE TABLE Foo ( FooID bigint identity(1,1) not null primary key, other stuff... )
CREATE TABLE Bar ( BarID bigint identity(1,1) not null primary key, other stuff... )
The three basic options are (in all cases assume a foreign key is created on the FooID and BarID columns):
-- Option 1: Compound primary key
CREATE TABLE FooBarXRef (
FooID bigint not null
, BarID bigint not null
, PRIMARY KEY ( FooID, BarID )
, CONSTRAINT FK... etc
)
-- Option 2: Independent primary key + unique index
CREATE TABLE FooBarXRef (
FooBarXRefID bigint identity(1,1) not null primary key
, FooID bigint not null
, BarID bigint not null
, CONSTRAINT FK... etc
);
CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID );
-- Option 3: Unique index, no explicit primary key:
CREATE TABLE FooBarXRef (
FooID bigint not null
, BarID bigint not null
, CONSTRAINT FK... etc
);
CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID );
Does having a separate identity PK on the xref table to be redundant; that may needlessly introduces another layer of constraint checking on the database engine?
On the other hand are multi-column primary keys problematic? With a proposed solution to have the xref table contain only the two foreign keys, and define a unique index on those columns, but not define a primary key at all... ?
I suspect that doing so will cause SQL Server to create an internal primary key for the purposes of uniquely identifying each row, thus yielding the same redundant constraints as if a primary key were defined explicitly--but I have no proof or documentation to support this. Other questions and answers suggest that there is not an internal primary key by default (i.e. no equivalent to the Oracle ROWID); as the %%physloc%% is an indicator of where a row is currently stored and thus is subject to change. My intuition is that the engine must create something to uniquely identify a row in order to implement cursors, transactions, and concurrency.
The concept of a primary key is really about relational theory; maintaining referential integrity by building relationships across multiple tables. The SQL Server engine, by default, creates a unique clustered index when a primary key is built (assuming a clustered index doesn't exist at the moment).
It's this clustered index that defines a unique row at the leaf level. For tables that have a non-unique clustered index, SQL Server creates a 4byte "uniquifier" to to the end of your key.
TestTable1 Primary Key
TestTable2 Primary Key & Unique Non-Clustered
TestTable3 Unique Clustered
TestTable4 Primary Clustered (same as Table1 & Table3, since a primary key CAN be defined on a non-clustered index I prefer this to always define which structure I want).
TestTable2 is redundant, it's create a unique clustered index to store all the records at it's leaf level. It's then creating a unique non-clustered index to enforce uniqueness once again. Any changes on the table will hit the clustered and then the non-cluster.
TestTable1, TestTable3, TestTable4 are a tie in my book, a unique clustered index structure is created on all. There is no physical difference in the way records are stored on a page.
However for SQL Server Replication, all replicated tables required a primary key. If your'll be using Replication in the future you may want to make sure all your unique clustered indexes are primary keys as well.
I seem to be unable to paste in my verifying scripts, so here they are on hastebin.
http://hastebin.com/qucajimixi.vbs
Well, it all depends on the requirement. As far as I know
PRIMARY KEY= UNIQUE KEY+NOT NULL key
What this tells you is that you can have multiple
NOT NULL UNIQUE INDEXES(NON CLUSTERED)
but
CANNOT HAVE MULTIPLE PRIMARY KEYS IN A TABLE( CLUSTERED).
I am a huge believer of Relational database model and working with the PRIMARY-FOREIGN KEYS relationships. DB replication requires you to have Primary Key on a table ; therefore, it is always a good practice to create Primary Key instead of UNIQUE keys for your table.

Foreign Key Referencing a Technical Key

So, I've got a table created like so:
create table CharacterSavingThrow
(
CharacterCode int not null,
constraint FK_CharacterSavingThrowCharacterID foreign key (CharacterCode) references Character(CharacterCode),
FortitudeSaveCode int not null,
constraint FK_CharacterSavingThrowFortitudeSaveCode foreign key (FortitudeSaveCode) references SavingThrow(SavingThrowCode),
ReflexSaveCode int not null,
constraint FK_CharacterSavingThrowReflexSaveCode foreign key (ReflexSaveCode) references SavingThrow(SavingThrowCode),
WillSaveCode int not null,
constraint FK_CharacterSavingThrowWillSaveCode foreign key (WillSaveCode) references SavingThrow(SavingThrowCode),
constraint PK_CharacterSavingThrow primary key clustered (CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
)
I need to know how I would reference the primary key of this table from another table's constraint? Seems like a pretty simple question, either it's possible or not, right? Thanks for your guys's help!
Yes - totally easy - you just have to specify the complete compound index, e.g. your other table also needs to have those four columns that make up the PK here, and then the FK constraint would be:
ALTER TABLE dbo.YourOtherTable
ADD CONSTRAINT FK_YourOtherTable_CharacterSavingThrow
FOREIGN KEY(CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
REFERENCES dbo.CharacterSavingThrow(CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
The point is: if you have a compound primary key (made up of more than one column), any other table wanting to reference that table also must have all those columns and use all those columns for the FK relationship.
Also, if you're writing queries that would join those two tables - you would have to use all columns contained in the compound PK for your joins.
That's one of the main drawbacks of using four columns as a PK - it makes FK relationships and JOIN queries awfully cumbersome and really annoying to write and use. For that reason, in such a case, I would probably opt to use a separate surrogate key in the table - e.g. introduce a new INT IDENTITY on your dbo.CharacterSavingThrow table to act as primary key, that would make it a lot easier to reference that table and write JOIN queries that use that table.

PRIMARY KEYs vs. UNIQUE Constraints

In an Alexander Kuznetsov article, he presents the follow code snippet:
CREATE TABLE dbo.Vehicles(
ID INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT Vehicles_PK PRIMARY KEY(ID),
CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]),
CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck'))
);
This snippet raises a few questions for me.
Why is it necessary to include both ID and Type in the unique constraint? If just ID is unique, then the combination of the two columns will always be unique as well.
Also, I know how to set a primary key and specify if it unique in SSMS. But how would I specify a primary key on one column, and make a unique constraint on a combination of columns? Does this create two indexes?
This came up because I'm trying to implement similar code, which does not create a composite primary key, and I get the following error. So I'm trying to understand this code better.
The columns in table 'MyTable' do not match an existing primary key or UNIQUE constraint.
EDIT
I was able to get this working by simply creating a composite primary key in MyTable. The actual table definition is shown below. Again, this works. But it is not the same as the code quoted above. And I'm not sure if it would be better if I did it the other way.
CREATE TABLE [dbo].[MessageThread](
[Id] [int] IDENTITY(1,1) NOT NULL,
[MessageThreadType] [int] NOT NULL,
CONSTRAINT [PK_MessageThread_1] PRIMARY KEY CLUSTERED
(
[Id] ASC,
[MessageThreadType] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MessageThread] WITH CHECK ADD CONSTRAINT [CK_MessageThread_ValidType] CHECK (([MessageThreadType]=(2) OR [MessageThreadType]=(1)))
GO
ALTER TABLE [dbo].[MessageThread] CHECK CONSTRAINT [CK_MessageThread_ValidType]
GO
1 : I am not sure of the specific purpose of the given schema. But note that a unique constraint can be applied for multiple reasons, most commonly: (a) to enforce uniqueness and (b) to provide the optimizer with more information to base decisions.
2 : A unique constraint does not create two indexes. It creates a single index with one of the columns as the leading key column. It enforces uniqueness on both. So a unique constraint on a,b could have:
a b
---- ----
1 1
1 2
2 1
2 2
Notice that neither of the columns enforce uniqueness individually. I am not a big fan of using the table designer in SSMS (it has tons of bugs and doesn't support all functionality) but here is how to do it:
a) right-click the grid and choose Indexes/Keys...
b) choose multiple columns using the [...] button in the Columns grid
c) change Type to Unique Key
d) change the Name if desired
Here's an example of a table that already has a primary key. I could add one or more unique indexes if I wanted to:
In my understanding, the reason for unique constraint on ID,[Type] is let detail tables to refer ID,[Type] as foreign key. Usually parent table is required to have unique constraint on columns used for foreign key. For instance, the table in the question can have 2 detail tables:
CREATE TABLE dbo.CARS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Car',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
CREATE TABLE dbo.TRUCKS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Truck',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
This way Cars will have details only about Car type, whereas TRUCKS only about Truck.
Such design is used to avoid polymorphic relationship, for instance
CREATE TABLE dbo.VEHICLE (
...,
ref_id INT NOT NULL,
-- PK of 'master' table
ref_name VARCHAR(20) NOT NULL,
-- here we put 'truck' or 'car', so we virtually have 2 parents;
-- in this case we cannot use FK constraint, the only thing that may
-- somehow enforce the logical constraint is writing a trigger
Update
Your updated table definition looks good to me. I guess the sample table was initially designed for Oracle and then ported to SQLServer. In Oracle, that unique constraint and primary key can use the same index, so there is no penalty for having both PK and Unique constraint.
Good question. Theoretically you're right; there is no reason, a record can always be uniquely identified by its PK and the unique constraint will always be satisfied as long as this is true. However, if ID and Type have some relationship outside the bounds of the data layer (maybe this table is the data model for an Enum?), then it's unlikely that there would be two different IDs with the same Type because the uniqueness of Type is enforced elsewhere. The constraint also sets up an index that includes both ID and Type, making the table relatively efficient to be queried by that combination of columns.
You set up a unique constraint using the "Manage Indexes and Keys" option. Yes, this will create an index and unique constraint for the primary key, and an index and unique constraint for the combination of PK and Type.
I suspect the reason for having both columns in the UNIQUE constraint is related to the error message you mentioned. SQL Server (in common with other SQL DBMSs) has a limitation that a FOREIGN KEY constraint can only reference exactly the set of columns defined by a uniqueness constraint. So if a FOREIGN KEY constraint references two columns then those two columns must have a uniqueness constraint on them - even if other constraints already guarantee uniqueness. This is a pointless limitation but it is part of standard SQL.
The following example is quite similar and explains why a composite foreign key and nested uniqueness constraints can be useful.
http://consultingblogs.emc.com/davidportas/archive/2007/01/08/Distributed-Keys-and-Disjoint-Subtypes.aspx
Here you go:
Cars and trucks have different attributes, so they do not belong in one table. This is why I have two tables, Cars and Trucks.
Yet cars and trucks share some attributes, such as VIN (vehicle idenification number). More to the point, VIN is unique. This is why I need a table Vehicles. A vehicle cannot be both a car and a truck, so I must make sure it is not possible to enter both (VIN=123456789, Type=Car) and (VIN=123456789, Type=Truck). This is why I have a PK on VIN only.
I must ensure that a vehicle cannot have corresponding rows in both Cars and Trucks tables. This is why I have Type column in Cars and Trucks, and this is why I want (VIN, Type) in child tables Cars and Trucks refer to the parent table Vehicles. The only reason why I need an additional unique constraint on (VIN, Type) is this: it is referred by FK constraints from child tables.
BTW, you could leave a comment on the blog - in that case sqlblog would send me a message. It is a coincidence that I noticed your question here; I was supposed to go skiing, only there is no snow.

Resources