I need to introduce a many-to-many relationship between two tables, which both have an integer for primary key, in a SQL Server database. How is this best done in T-SQL?
Consider the following two example table definitions for which there should be a many-to-many relationship:
CREATE TABLE [dbo].[Authors] (
[Id] INT IDENTITY (1, 1) NOT NULL,
CONSTRAINT [PK_Versions] PRIMARY KEY CLUSTERED ([Id] ASC)
);
CREATE TABLE [dbo].[Books] (
[Id] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
The traditional way is to use an additional many:many (junction) table, which links to both tables:
CREATE TABLE [dbo].[AuthorsBooks] (
-- Optionally, we can give the table its own surrogate PK
[Id] INT IDENTITY(1,1) NOT NULL,
AuthorId INT NOT NULL,
BookId INT NOT NULL,
-- Referential Integrity
FOREIGN KEY(AuthorId) REFERENCES Authors(Id),
FOREIGN KEY(BookId) REFERENCES Books(Id),
-- PK is either the surrogate ...
PRIMARY KEY CLUSTERED ([Id] ASC)
-- ... Or the compound key
-- PRIMARY KEY CLUSTERED (AuthorId, BookId)
);
One moot point is whether you want the compound key AuthorId, BookId to be the Primary Key, or whether to add your own new Surrogate - this is usually a subjective preference.
Some of the points to consider whether going for a compound primary key or a new surrogate key for the Junction table:
Without the surrogate, external tables linking to the junction table would need to store both compound keys (i.e. would need to retain both AuthorId and BookId as foreign keys).
So a new surrogate offers the potential benefit of a narrower primary key, which then means any tables linking to this junction table will have a single, narrower foreign key.
However, with the compound keys, there can be an optimisation benefit that tables can join directly to the underlying Books or Authors tables without first joining to the junction table.
The following diagram hopefully makes the case of the compound key clearer (the middle table Nationality is a junction table of PersonCountry):
Edit
Usage is straightforward - if the link exists in the many:many table, then the relationship is deemed to exist. To test the existence, you 'join through' the link table e.g.
-- Find all books written by AuthorId 1234
SELECT b.*
FROM Books b
INNER JOIN AuthorsBooks ab
ON b.Id = ab.BookId
WHERE ab.AuthorId = 1234;
Related
I have following question: let's say we have a Chen-Notation 1:n and m:n.
So 1 has a primary key and n also, where do I type the foreign key ? in the n ?
And the second question is about m:n, both have a primary key, and I need 1 more table because it's m:n, do I type the both primary keys as foreign keys in the 3rd table?
Example of a 1:n relationship : customers and orders
One customer may have several orders. In this situation, you want a column in the orders table with a foreign key that references the primary key of the customers table.
Sample DDL:
create table customers (
id int primary key,
name varchar(50),
email varchar(50)
);
create table orders (
id int primary key
price float,
customer_id int foreign key references customer(id)
);
Example of a n:m relationship : books and authors
A book may be written by more than one author. An author may have written more than one book. You create a bridge table, also called junction table, called books_authors, to represent that relationship, and that contains foreign keys to the two other tables.
Sample:
create table books (
id int primary key,
name varchar(50)
);
create table authors (
id int primary key,
name varchar(50)
);
create table books_authors(
book_id int foreign key references books(id),
author_id int foreign key references authors(id),
constraint pk_books_authors primary key(book_id, author_id)
);
So 1 has a primary key and n also, where do I type the foreign key ? in the n ?
The foreign key lives in the n, because 1 can have many keys, but you can't have a field that holds multiple values, you need to have one value per field, so it's in the n.
And the second question is about m:n, both have a primary key, and I need 1 more table because it's m:n, do I type the both primary keys as foreign keys in the 3rd table?
Yes, because that 3rd table in effect has a many-to-one relationship to both tables.
I have a language map table that each entry has the possibility of a foreign key relationship with a few different tables. What is the best schema to deal with this configuration?
Tables: LanguageMap, TableA, TableB
These are the two possibility:
1. Lookup Column Method - No Foreign Key Constraints:
Create Table LanguageMap (
Id int not null primary key,
Language nvarchar not null,
Value nvarchar not null,
Type nvarchar not null, -- 'TableA', 'TableB', etc.
ForeignTableId int not null -- Is Foreign key to another table dependent on the type of the row.
)
2. Multiple Foreign Key Columns
create Table LanguageMap(
Id int not null primary key,
Language nvarchar not null,
Value nvarchar not null,
Type nvarchar not null, -- 'Activity', 'Verb', etc.
TableAId int null,
TableBId int null
)
alter table LanguageMap add constraint FK_LanguageMap_TableA
foreign key (TableAId) references TableA (Id)
alter table LanguageMap add constraint FK_LanguageMap_TableA
foreign key (TableBId) references TableB (Id)
alter table LanguageMap add constraint CK_LanguageMap_OneIsNotNull
check (TableAId is not null or TableBId is not null)
go
alter table LanguageMap add constraint CK_LanguageMap_OneIsNull
check (TableAId is null or TableBId is null)
go
The foreign key constraints are based on Foreign Key for either-or column?
There is another alternative, called "Shared Primary Key". You can look this up. If TableA, TableB, TableC, etc. all "inherit" their PK as a copy of the PK from some Master table, called "TableMaster" for example, then you can just use that as an FK in LanguageMap.
The correct joins will select the correct instances.
The shared primary key is often used in conjunction with a design pattern called "Class Table Inheritance". Without knowing what TableA, TableB, TableC, etc. are about, I can't say whether Class Table Inheritance is relevant to your case.
In any event, look up both "Shared Primary Key" and "Class Table Inheritance" for further reading.
There are tags with those names in this area.
What is the best way to make a simple many-to-many cross reference table which contains nothing but two columns which are themselves primary keys in other tables?
Does anyone have concrete evidence for or against creating a table with a single unique index, but no primary key? (Alternatives are detailed below).
Put another way: How does SQL Server internally uniquely identifies rows a) that have a primary key and b) that do not have a primary key?
In detail:
Given the input tables:
CREATE TABLE Foo ( FooID bigint identity(1,1) not null primary key, other stuff... )
CREATE TABLE Bar ( BarID bigint identity(1,1) not null primary key, other stuff... )
The three basic options are (in all cases assume a foreign key is created on the FooID and BarID columns):
-- Option 1: Compound primary key
CREATE TABLE FooBarXRef (
FooID bigint not null
, BarID bigint not null
, PRIMARY KEY ( FooID, BarID )
, CONSTRAINT FK... etc
)
-- Option 2: Independent primary key + unique index
CREATE TABLE FooBarXRef (
FooBarXRefID bigint identity(1,1) not null primary key
, FooID bigint not null
, BarID bigint not null
, CONSTRAINT FK... etc
);
CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID );
-- Option 3: Unique index, no explicit primary key:
CREATE TABLE FooBarXRef (
FooID bigint not null
, BarID bigint not null
, CONSTRAINT FK... etc
);
CREATE UNIQUE INDEX I_FooBarXRef_FooBar ON FooBarXRef ( FooID, BarID );
Does having a separate identity PK on the xref table to be redundant; that may needlessly introduces another layer of constraint checking on the database engine?
On the other hand are multi-column primary keys problematic? With a proposed solution to have the xref table contain only the two foreign keys, and define a unique index on those columns, but not define a primary key at all... ?
I suspect that doing so will cause SQL Server to create an internal primary key for the purposes of uniquely identifying each row, thus yielding the same redundant constraints as if a primary key were defined explicitly--but I have no proof or documentation to support this. Other questions and answers suggest that there is not an internal primary key by default (i.e. no equivalent to the Oracle ROWID); as the %%physloc%% is an indicator of where a row is currently stored and thus is subject to change. My intuition is that the engine must create something to uniquely identify a row in order to implement cursors, transactions, and concurrency.
The concept of a primary key is really about relational theory; maintaining referential integrity by building relationships across multiple tables. The SQL Server engine, by default, creates a unique clustered index when a primary key is built (assuming a clustered index doesn't exist at the moment).
It's this clustered index that defines a unique row at the leaf level. For tables that have a non-unique clustered index, SQL Server creates a 4byte "uniquifier" to to the end of your key.
TestTable1 Primary Key
TestTable2 Primary Key & Unique Non-Clustered
TestTable3 Unique Clustered
TestTable4 Primary Clustered (same as Table1 & Table3, since a primary key CAN be defined on a non-clustered index I prefer this to always define which structure I want).
TestTable2 is redundant, it's create a unique clustered index to store all the records at it's leaf level. It's then creating a unique non-clustered index to enforce uniqueness once again. Any changes on the table will hit the clustered and then the non-cluster.
TestTable1, TestTable3, TestTable4 are a tie in my book, a unique clustered index structure is created on all. There is no physical difference in the way records are stored on a page.
However for SQL Server Replication, all replicated tables required a primary key. If your'll be using Replication in the future you may want to make sure all your unique clustered indexes are primary keys as well.
I seem to be unable to paste in my verifying scripts, so here they are on hastebin.
http://hastebin.com/qucajimixi.vbs
Well, it all depends on the requirement. As far as I know
PRIMARY KEY= UNIQUE KEY+NOT NULL key
What this tells you is that you can have multiple
NOT NULL UNIQUE INDEXES(NON CLUSTERED)
but
CANNOT HAVE MULTIPLE PRIMARY KEYS IN A TABLE( CLUSTERED).
I am a huge believer of Relational database model and working with the PRIMARY-FOREIGN KEYS relationships. DB replication requires you to have Primary Key on a table ; therefore, it is always a good practice to create Primary Key instead of UNIQUE keys for your table.
So, I've got a table created like so:
create table CharacterSavingThrow
(
CharacterCode int not null,
constraint FK_CharacterSavingThrowCharacterID foreign key (CharacterCode) references Character(CharacterCode),
FortitudeSaveCode int not null,
constraint FK_CharacterSavingThrowFortitudeSaveCode foreign key (FortitudeSaveCode) references SavingThrow(SavingThrowCode),
ReflexSaveCode int not null,
constraint FK_CharacterSavingThrowReflexSaveCode foreign key (ReflexSaveCode) references SavingThrow(SavingThrowCode),
WillSaveCode int not null,
constraint FK_CharacterSavingThrowWillSaveCode foreign key (WillSaveCode) references SavingThrow(SavingThrowCode),
constraint PK_CharacterSavingThrow primary key clustered (CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
)
I need to know how I would reference the primary key of this table from another table's constraint? Seems like a pretty simple question, either it's possible or not, right? Thanks for your guys's help!
Yes - totally easy - you just have to specify the complete compound index, e.g. your other table also needs to have those four columns that make up the PK here, and then the FK constraint would be:
ALTER TABLE dbo.YourOtherTable
ADD CONSTRAINT FK_YourOtherTable_CharacterSavingThrow
FOREIGN KEY(CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
REFERENCES dbo.CharacterSavingThrow(CharacterCode, FortitudeSaveCode, ReflexSaveCode, WilSaveCode)
The point is: if you have a compound primary key (made up of more than one column), any other table wanting to reference that table also must have all those columns and use all those columns for the FK relationship.
Also, if you're writing queries that would join those two tables - you would have to use all columns contained in the compound PK for your joins.
That's one of the main drawbacks of using four columns as a PK - it makes FK relationships and JOIN queries awfully cumbersome and really annoying to write and use. For that reason, in such a case, I would probably opt to use a separate surrogate key in the table - e.g. introduce a new INT IDENTITY on your dbo.CharacterSavingThrow table to act as primary key, that would make it a lot easier to reference that table and write JOIN queries that use that table.
Would it be possible in SQL Server 2008 to have a table created with 2 columns that are at the same time primary and foreign keys? If yes, how would such a code look like? I've searched and came up with nothing.
Sure, no problem:
CREATE TABLE dbo.[User]
(
Id int NOT NULL IDENTITY PRIMARY KEY,
Name nvarchar(1024) NOT NULL
);
CREATE TABLE [Group]
(
Id int NOT NULL IDENTITY PRIMARY KEY,
Name nvarchar(1024) NOT NULL
);
CREATE TABLE [UserToGroup]
(
UserId int NOT NULL,
GroupId int NOT NULL,
PRIMARY KEY CLUSTERED ( UserId, GroupId ),
FOREIGN KEY ( UserId ) REFERENCES [User] ( Id ) ON UPDATE NO ACTION ON DELETE CASCADE,
FOREIGN KEY ( GroupId ) REFERENCES [Group] ( Id ) ON UPDATE NO ACTION ON DELETE CASCADE
);
This is quite commonly used to model many-to-many relations.
These are totally different constructs.
A Primary Key is used to enforce uniqueness within a table, and be a unique identifier for a certain record.
A Foreign Key is used for referential integrity, to make sure that a value exists in another table.
The Foreign key needs to reference the primary key in another table.
If you want to have a foreign key that is also unique, you could make a FK constraint and add a unique index/constraint to that same field.
For reference purposes, SQL Server allows a FK to refer to a UNIQUE CONSTRAINT as well as to a PRIMARY KEY field.
It is probably not a good idea since often you want to allow duplicate foreign keys in the table. Even if you don't now, in the future, you might, so best not to do this. See Is it fine to have foreign key as primary key?
Just a quick note - from Microsoft pages (http://msdn.microsoft.com/en-us/library/ms189049.aspx)...
"A foreign key constraint does not have to be linked only to a primary key constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table."
Not used often, but useful in some circumstances.