composite vs surrogate primary key

composite vs surrogate primary key - sql-server

I am designing a database with the following requirements:
An organization can exist on its own
An organization can have any number of distinct terms (date range)
An organization can have any number of survey types (student, teacher, parent, etc)
A survey form is assigned a term and survey type
A structure for this might be:
Organization
- OrganizationId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
Term
- TermId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyType
- SurveyTypeId IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyForm
- SurveyFormId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- SurveyTypeId INT NOT NULL REFERENCES SurveyType(SurveyTypeId)
- TermId INT NOT NULL REFERENCES Term(TermId)
That structure keeps with what seems to be a popular emphasis on a single surrogate primary key. However that structure sacrifices data integrity because it is very easy for a SurveyForm record to have a TermId or SurveyTypeId from different Organizations.
To address data integrity, it would seem you would have to add OrganizationId and use it in the composite keys (OrganizationId, SurveyTypeId) and (OrganizationId, TermId). That is somewhat tolerable in this example but as the schema becomes more complete, the composite key sizes increase.
So my question is, how do people generally approach this now (most references online are from 2008 when I think its possible there were different database design concerns)? As a corollary, when is it acceptable to add foreign keys to a table to reduce the number of tables joined for common expressions?

Academically speaking, you can migrate the Organization key along both lineages. That's just 4 bytes, after all:
create table dbo.Organization (
OrganizationId INT IDENTITY(1,1) PRIMARY KEY
);
go
create table dbo.Term (
TermId INT IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, TermId)
);
go
create table dbo.SurveyType (
SurveyTypeId int IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, SurveyTypeId)
);
go
create table dbo.SurveyForm (
SurveyFormId INT IDENTITY(1,1) NOT NULL,
OrganizationId int not null,
SurveyTypeId INT NOT NULL,
TermId INT NOT NULL,
primary key (OrganizationId, SurveyTypeId, TermId),
foreign key (OrganizationId, TermId) references dbo.Term (OrganizationId, TermId),
foreign key (OrganizationId, SurveyTypeId) references dbo.SurveyType (OrganizationId, SurveyTypeId)
);
go
These tables definitely violate some NF, I don't remember which one exactly, but I'm sure you can handle it yourself.
While this design approach can almost be considered a must for a warehouse (esp. if you aggregate data from different sources), I would never recommend it for any real-life OLTP. Much simpler solution would be:
Perform all modifications via a stored procedure, which will have proper checks against this kind of possible discrepancy.
Make sure that no user would have permissions to directly add / modify data in the dbo.SurveyForm, thus circumventing the business rules implemented in the aforementioned SP.

I think there could be a way to avoid circular references, firstly by defining who really depends on who and removing redundant dependencies.
The question is... are Organizations allowed to be randomly associated to Terms without caring about any Survey association? I wonder if Organizations really need to be associated to a Term directly or indirectly through Surveys. If, for example, an Organization CANNOT be associated to a Term that is not associated to the Organization's Survey then the Organization-Term relationship is useless, if it is the other way around, then the Organization-SurveyType is not needed

Related

Foreign key to table A and B, where A already have a foreign key to B

Suppose there is a table called Accounts:
CREATE TABLE Accounts
(
[Id] int not null primary key identity(1,1)
[Username] varchar(20) not null unique,
[Password] varchar(20) not null
)
Then, there is another tabled called Characters. Each account can have N characters. So I can use a foreign key to link these characters.
CREATE TABLE Characters
(
[AccountId] int not null foreign key references Accounts([Id]),
[Id] int not null primary key identity(1,1),
[Nickname] varchar(20) not null unique,
[Level] int not null default 0,
)
Each character can have multiple equipments (inventory), so there is a Equipments table.
Since each equipment is linked to a character, I should use foreign key again, and there comes the problem.
Me and my coworker were arguing about which foreign key to use.
Since each character has a unique Id, I told him that we could use foreign key to that Id and that would be enough. As follows:
CREATE TABLE Equipments
(
[CharId] int not null foreign key references Characters([Id]),
[ItemId] int not null
)
He told me that we must use a foreign key to the character id AND the account id, as follows:
CREATE TABLE Equipments
(
[AccountId] int not null foreign key references Accounts([Id]), /*is this necessary?*/
[CharId] int not null foreign key references Characters([Id]),
[ItemId] int not null
)
I'm not expert in Sql Server and in my opinion, the foreign key to the account id is completely unecessary but he keeps telling me that we must use it and it will help performance because the more foreign key you use, it will be better.
So, should I use foreign key to account id and character id or character id is good enough?

As you said, there is a one-to-many relationship between Account and Character (and hence, a character cannot belong to more than one account).
Similarly, as you described, each record in Equipments only corresponse to a unique record in Characters. The relation from Account to Equipments hence can be inferred, and so, there is no need to create an extra column in the Equipments table. Also, the data integrity is preserved just by the two foreign keys already created, so that should not be a problem when you go without the AccountId column in the Equipments table.
Regarding the performance argument, this is a case-by-case situation, and it depends on a lot of other things (number of records, business logic,...). Having unnecessary foreign key can even hurt performance since the database/server will need to maintain that foreign key while operate. Also, I found that if you do not have the key and when you find out that you need it, it is easier to add one in than to remove an existing one, especially when you have to create a whole new column for this one (this last piece is a mere personal opinion).

You should use it only if you plan to interrogate equipment directly for an account which is faster than joining with account via char. Otherwise, no, you shouldn't use it.

You are correct, but for a more important reason.
If you include Accountid in the Equipments table, then you have a second relationship to the Accounts table. Perhaps this is allowed, but in all likelihood, you intend to have the Characters.AccountId be the account id for a row in Equipments.
You would then get the appropriate account id by using a join to the Equipments table.

How to ensure uniqueness in many-to-many relationship table?

Users have many roles, roles have many users.
In USERS_ROLES table, have 3 columns: USERS_ROLES_ID, USER_ID, ROLE_ID
Usually USERS_ROLES_ID is just sequentially generated. Someone told me it's supposed to guarantee that user_id and role_id cross product are unique, so the primary key USERS_ROLES_ID should actually be some sort of combination of both USER_ID and ROLE_ID. How is this done, usually? (for example, USER_ID * (big number here) + ROLE_ID)?? Every example I could find uses a naive sequential primary key generation of the many-to-many join table.

Having a sequentially generated USERS_ROLE_ID primary key will not guarantee a unique combination of USER_ID and ROLE_ID. Adding a unique index on (USER_ID, ROLE_ID) will.

Gerrat is right. I found the full answer here: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
Create table CustomerProducts
(
Customer_ProductID int identity primary key,
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null
)
This is what I see in perhaps most of the databases that I’ve worked
with over the years. The reason for designing a table in this manner?
Honestly, I don’t know! I can only surmise that it is because of the
lack of understanding what a primary key of a table really is, and
that it can be something other than an identity and that it can be
comprised of more than just a single column. As I mentioned, it seems
that many database architects are simply not aware of this fact.
Consider instead the following design:
Create table CustomerProducts (
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null,
Primary key (CustomerID, ProductID) )
Notice here that we have eliminated the identity column, and have
instead defined a composite (multi-column) primary key as the
combination of the CustomerID and ProductID columns. Therefore, we do
not have to create an additional unique constraint. We also do not
need an additional identity column that really serves no purpose. We
have not only simplified our data model physically, but we’ve also
made it more logically sound and the primary key of this table
accurately explains what it is this table is modeling – the
relationship of a CustomerID to a ProductID.

Missing FK Relationship in Entity Framework Model

I had a lot of trouble implementing the technique described in an Alexander Kuznetsov article. Basically, the article describes a way to create a FK between one table and alternate tables, and still maintain full constraints on those relationship.
Here's part of Alexander's code:
CREATE TABLE dbo.Vehicles(
ID INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT Vehicles_PK PRIMARY KEY(ID),
CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]),
CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck'))
)
CREATE TABLE dbo.Cars(ID INT NOT NULL,
[Type] AS CAST('Car' AS VARCHAR(5)) PERSISTED,
OtherData VARCHAR(10) NULL,
CONSTRAINT Cars_PK PRIMARY KEY(ID),
CONSTRAINT Cars_FK_Vehicles FOREIGN KEY(ID, [Type])
REFERENCES dbo.Vehicles(ID, [Type])
)
I finally got it working after errors and confirmed bugs. But when I generate my EF models from the new schema, it is missing a relationship between two of my tables.
The problem is that, in order to have a FK on two columns, there must be an index or unique constraint on both those columns. However, in my case, I also have another table with a FK to a single column in the base table (Vehicles, in Alexander's code).
Since you cannot have more than one PK in a table, this means I cannot have a FK to a PK on both sides. The PK can be for one or two columns, and the other FK will need to reference the non-PK unique constraint.
Unfortunately, Entity Framework will only create relationships for you when there is a FK to a PK. That's the problem. Can someone who understand DB design better than I spot any other alternatives here?
Note: I realize some will see the obvious fix as simply modifying the model to manually add the additional relationship. Unfortunately, we are using a database project and are constantly using automated systems to regenerate the project and model from an updated database. So manual steps are really not practical.

You can't have more than one PK, but you can have more than one unique constraint, and in SQL Server you can create a foreign key constraint that references a unique constraint (one or multiple columns). Here is an example of two tables that roughly look like your model.
CREATE TABLE dbo.Vehicles
(
VehicleID INT PRIMARY KEY,
[Type] VARCHAR(5) NOT NULL UNIQUE,
CONSTRAINT u1 UNIQUE(VehicleID, [Type])
);
CREATE TABLE dbo.Cars
(
CarID INT PRIMARY KEY,
VehicleID INT NOT NULL
FOREIGN KEY REFERENCES dbo.Vehicles(VehicleID),
[Type] VARCHAR(5) NOT NULL
FOREIGN KEY REFERENCES dbo.Vehicles([Type]),
CONSTRAINT fk1 FOREIGN KEY (VehicleID, [Type])
REFERENCES dbo.Vehicles(VehicleID, [Type])
);
Note that Cars has three foreign keys: one points to the PK of vehicles (VehicleID), one points to the unique constraint on Vehicles([Type]), and one points to the multi-column unique constraint on Vehicles(VehicleID, [Type]). I realize this is not equivalent to what you are trying to do but should demonstrate that SQL Server, at least, is capable of doing everything you seem to want to do (I'm having a hard time concluding what you're actually because you keep swapping concepts between what Alex did, what you're trying to do but failing, and what you've done successfully).
Are you saying that EF will not recognize a foreign key that references a unique constraint? If so, does that affect constraints that have more than one column, or all unique constraints? If this is the case, that's a shame, because it is certainly supported in SQL Server. Seems like this would either be a bug or an intentional omission (given that the standard doesn't strictly allow FKs against unique constraints). I wonder if there are any bugs reported on Connect?
I have no idea how to force EF to recognize it, but I do know that just about all the people I know who use database projects end up performing pre- or post-deployment modifications and these can be relatively automated.

Database design SQL Server

Say I have a database with multiple entitles like person, company, conference for which you have to keep track of say addresses. We can have multiple addresses for the same entity (person). One approach is to have a separate address table for each entity (person_address etc). Another approach is to have an address table which has primary key (Entity,id,address_type). In this approach we cannot use foreign keys from address table to entities .
So what is the better approach. Is there another way to do this ?
thanks

At a logical modeling POV your descriptions highlights the fact that the entities like person, company, conference etc have a common trait: they have zero, one or more addresses. If you would model this as a class hierarchy, perhaps you would create an Addressable class and have person, company and conference inherit from this Addressable class. You can apply the same reasoning to your data model and have an addresable table with an addressable_entity_id. The person, company, conference entities would 'inherit' this table. There are three established ways to implement table inheritance:
Class Table Inheritance
Single Table Inheritance
Concrete Table Inheritance
So you could model your tables like this:
create table Addresses (AddressId int not null identity(1,1) primary key, ...);
create table Addressable (AddressableId int not null identity (1,1) primary key, ...);
create table AddressableAddress (
AddressId int not null,
AddressableId int not null,
constraint AddressableAddressAddressId
foreign key (AddressId) references Addresses(AddressId),
constraint AddressableAddressAddressableId
foreign key (AddressableId) references Addressable(AddressableId));
create table Person (PersonId int not null identity(1,1) primary key,
AddressableId int not null,
...,
constraint PersonAddressableAddressableId
foreign key AddressableId references Addressable (AddressableId));
create table Company (CompanyId int not null identity(1,1) primary key,
AddressableId int not null,
...,
constraint CompanyAddressableAddressableId
foreign key AddressableId references Addressable (AddressableId));
Of course you have to find the right balance between absolute relational normal form and actual usability. In this scheme I propose for instance in order to insert a new Person one has to first a row in Addressable, get the AddressableId and then proceed and insert the person. This may or may nor work. BTW, there is a way to do such an insert in one single statement using the OUTPUT clause to chain two inserts:
insert into Addressable (AddressableType)
output inserted.AddressableId, #FirstName, #LastName
into Person (AddressableId, FirstName, LastName)
values (AddressableTypePerson);
But now is difficult to retrieve the newly inserted PersonId.

Technically if two people live at the same address you would not be completely normalized if there was simply a single one-to-many detail table for the row in TBLPerson called TBLAddress However, if you want just one instance per physical address you will incur the overhead of a many-to-many relation table of TBLPersonAddresses which FK's to TBLAddress
I would say that unless you expect multiple people at the same address to be the norm that I would simply have the TBLAddress with column personID as a detail to the TBLPerson
EDIT: And I tend to always use surrogate keys unless I have a specific reason not to do so.

Database design - composite key relationship issue

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!

If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)