Database design - composite key relationship issue

Database design - composite key relationship issue - sql-server

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!

If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)

Related

Creating a foreign key against a composite key in MS SQL Server

I'm trying to create a foreign key between two tables. Problem is one of those tables has a composite primary key..
My tables are products (one row per product) and product_price_history (many rows per product).
I have a composite key in product_price_history, which is product id and start date of a specific price for that product.
Here's my code :
CREATE TABLE products (
product_id INT IDENTITY(1,1) PRIMARY KEY,
product_name VARCHAR(50) NOT NULL,
product_desc VARCHAR(255) NULL,
product_group_id INT
)
CREATE TABLE product_price_history (
product_id INT NOT NULL,
start_date DATE NOT NULL,
end_date DATE NULL,
price NUMERIC (6,2) NOT NULL
)
ALTER TABLE product_price_history
ADD CONSTRAINT pk_product_id_start_dt
PRIMARY KEY (product_id,start_date)
Now I'm trying to create a foreign key between the products table and the product_price_history table but I can't because its a composite key.
Also it doesn't make sense to add the start date (the other part of the foreign key) to the products table.
What's the best way to deal with this? Can I create a foreign key between these tables? Do I even NEED a foreign key?
My intentions here are
to enforce uniqueness of the product price information. A product can only have one price at any time.
to link these two tables so there's a logical join between them, and I can show this in a database diagram

The foreign key on the product_price_history table should only include product_id. Your target is to ensure that any entry product_price_history already has "parent" entry in products. That has nothing to do with start_date.
The way I see this situation, in theory, fully normalized version of the tables would have to have current_price as unique value in products table. And the product_price_history is simply a log table.
It's not necessary to do it this way, with a physical field, but thinking from this perspective helps to see where your tables model is slightly de-normalized.
Also, if you make product_price_history table anything but simple log table, how do you ensure that new start_date is newer than previous end_date? You can't even express that as a primary key. What if you edit start_date later? I would even think to create different compaund key for product_price_history table. Perhaps product_id+insert_date or only auto-increment id, while still keeping foreign key relationship to the products.product_id.

How to ensure uniqueness in many-to-many relationship table?

Users have many roles, roles have many users.
In USERS_ROLES table, have 3 columns: USERS_ROLES_ID, USER_ID, ROLE_ID
Usually USERS_ROLES_ID is just sequentially generated. Someone told me it's supposed to guarantee that user_id and role_id cross product are unique, so the primary key USERS_ROLES_ID should actually be some sort of combination of both USER_ID and ROLE_ID. How is this done, usually? (for example, USER_ID * (big number here) + ROLE_ID)?? Every example I could find uses a naive sequential primary key generation of the many-to-many join table.

Having a sequentially generated USERS_ROLE_ID primary key will not guarantee a unique combination of USER_ID and ROLE_ID. Adding a unique index on (USER_ID, ROLE_ID) will.

Gerrat is right. I found the full answer here: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
Create table CustomerProducts
(
Customer_ProductID int identity primary key,
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null
)
This is what I see in perhaps most of the databases that I’ve worked
with over the years. The reason for designing a table in this manner?
Honestly, I don’t know! I can only surmise that it is because of the
lack of understanding what a primary key of a table really is, and
that it can be something other than an identity and that it can be
comprised of more than just a single column. As I mentioned, it seems
that many database architects are simply not aware of this fact.
Consider instead the following design:
Create table CustomerProducts (
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null,
Primary key (CustomerID, ProductID) )
Notice here that we have eliminated the identity column, and have
instead defined a composite (multi-column) primary key as the
combination of the CustomerID and ProductID columns. Therefore, we do
not have to create an additional unique constraint. We also do not
need an additional identity column that really serves no purpose. We
have not only simplified our data model physically, but we’ve also
made it more logically sound and the primary key of this table
accurately explains what it is this table is modeling – the
relationship of a CustomerID to a ProductID.

composite vs surrogate primary key

I am designing a database with the following requirements:
An organization can exist on its own
An organization can have any number of distinct terms (date range)
An organization can have any number of survey types (student, teacher, parent, etc)
A survey form is assigned a term and survey type
A structure for this might be:
Organization
- OrganizationId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
Term
- TermId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyType
- SurveyTypeId IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyForm
- SurveyFormId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- SurveyTypeId INT NOT NULL REFERENCES SurveyType(SurveyTypeId)
- TermId INT NOT NULL REFERENCES Term(TermId)
That structure keeps with what seems to be a popular emphasis on a single surrogate primary key. However that structure sacrifices data integrity because it is very easy for a SurveyForm record to have a TermId or SurveyTypeId from different Organizations.
To address data integrity, it would seem you would have to add OrganizationId and use it in the composite keys (OrganizationId, SurveyTypeId) and (OrganizationId, TermId). That is somewhat tolerable in this example but as the schema becomes more complete, the composite key sizes increase.
So my question is, how do people generally approach this now (most references online are from 2008 when I think its possible there were different database design concerns)? As a corollary, when is it acceptable to add foreign keys to a table to reduce the number of tables joined for common expressions?

Academically speaking, you can migrate the Organization key along both lineages. That's just 4 bytes, after all:
create table dbo.Organization (
OrganizationId INT IDENTITY(1,1) PRIMARY KEY
);
go
create table dbo.Term (
TermId INT IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, TermId)
);
go
create table dbo.SurveyType (
SurveyTypeId int IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, SurveyTypeId)
);
go
create table dbo.SurveyForm (
SurveyFormId INT IDENTITY(1,1) NOT NULL,
OrganizationId int not null,
SurveyTypeId INT NOT NULL,
TermId INT NOT NULL,
primary key (OrganizationId, SurveyTypeId, TermId),
foreign key (OrganizationId, TermId) references dbo.Term (OrganizationId, TermId),
foreign key (OrganizationId, SurveyTypeId) references dbo.SurveyType (OrganizationId, SurveyTypeId)
);
go
These tables definitely violate some NF, I don't remember which one exactly, but I'm sure you can handle it yourself.
While this design approach can almost be considered a must for a warehouse (esp. if you aggregate data from different sources), I would never recommend it for any real-life OLTP. Much simpler solution would be:
Perform all modifications via a stored procedure, which will have proper checks against this kind of possible discrepancy.
Make sure that no user would have permissions to directly add / modify data in the dbo.SurveyForm, thus circumventing the business rules implemented in the aforementioned SP.

I think there could be a way to avoid circular references, firstly by defining who really depends on who and removing redundant dependencies.
The question is... are Organizations allowed to be randomly associated to Terms without caring about any Survey association? I wonder if Organizations really need to be associated to a Term directly or indirectly through Surveys. If, for example, an Organization CANNOT be associated to a Term that is not associated to the Organization's Survey then the Organization-Term relationship is useless, if it is the other way around, then the Organization-SurveyType is not needed

PRIMARY KEYs vs. UNIQUE Constraints

In an Alexander Kuznetsov article, he presents the follow code snippet:
CREATE TABLE dbo.Vehicles(
ID INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT Vehicles_PK PRIMARY KEY(ID),
CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]),
CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck'))
);
This snippet raises a few questions for me.
Why is it necessary to include both ID and Type in the unique constraint? If just ID is unique, then the combination of the two columns will always be unique as well.
Also, I know how to set a primary key and specify if it unique in SSMS. But how would I specify a primary key on one column, and make a unique constraint on a combination of columns? Does this create two indexes?
This came up because I'm trying to implement similar code, which does not create a composite primary key, and I get the following error. So I'm trying to understand this code better.
The columns in table 'MyTable' do not match an existing primary key or UNIQUE constraint.
EDIT
I was able to get this working by simply creating a composite primary key in MyTable. The actual table definition is shown below. Again, this works. But it is not the same as the code quoted above. And I'm not sure if it would be better if I did it the other way.
CREATE TABLE [dbo].[MessageThread](
[Id] [int] IDENTITY(1,1) NOT NULL,
[MessageThreadType] [int] NOT NULL,
CONSTRAINT [PK_MessageThread_1] PRIMARY KEY CLUSTERED
(
[Id] ASC,
[MessageThreadType] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MessageThread] WITH CHECK ADD CONSTRAINT [CK_MessageThread_ValidType] CHECK (([MessageThreadType]=(2) OR [MessageThreadType]=(1)))
GO
ALTER TABLE [dbo].[MessageThread] CHECK CONSTRAINT [CK_MessageThread_ValidType]
GO

1 : I am not sure of the specific purpose of the given schema. But note that a unique constraint can be applied for multiple reasons, most commonly: (a) to enforce uniqueness and (b) to provide the optimizer with more information to base decisions.
2 : A unique constraint does not create two indexes. It creates a single index with one of the columns as the leading key column. It enforces uniqueness on both. So a unique constraint on a,b could have:
a b
---- ----
1 1
1 2
2 1
2 2
Notice that neither of the columns enforce uniqueness individually. I am not a big fan of using the table designer in SSMS (it has tons of bugs and doesn't support all functionality) but here is how to do it:
a) right-click the grid and choose Indexes/Keys...
b) choose multiple columns using the [...] button in the Columns grid
c) change Type to Unique Key
d) change the Name if desired
Here's an example of a table that already has a primary key. I could add one or more unique indexes if I wanted to:

In my understanding, the reason for unique constraint on ID,[Type] is let detail tables to refer ID,[Type] as foreign key. Usually parent table is required to have unique constraint on columns used for foreign key. For instance, the table in the question can have 2 detail tables:
CREATE TABLE dbo.CARS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Car',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
CREATE TABLE dbo.TRUCKS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Truck',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
This way Cars will have details only about Car type, whereas TRUCKS only about Truck.
Such design is used to avoid polymorphic relationship, for instance
CREATE TABLE dbo.VEHICLE (
...,
ref_id INT NOT NULL,
-- PK of 'master' table
ref_name VARCHAR(20) NOT NULL,
-- here we put 'truck' or 'car', so we virtually have 2 parents;
-- in this case we cannot use FK constraint, the only thing that may
-- somehow enforce the logical constraint is writing a trigger
Update
Your updated table definition looks good to me. I guess the sample table was initially designed for Oracle and then ported to SQLServer. In Oracle, that unique constraint and primary key can use the same index, so there is no penalty for having both PK and Unique constraint.

Good question. Theoretically you're right; there is no reason, a record can always be uniquely identified by its PK and the unique constraint will always be satisfied as long as this is true. However, if ID and Type have some relationship outside the bounds of the data layer (maybe this table is the data model for an Enum?), then it's unlikely that there would be two different IDs with the same Type because the uniqueness of Type is enforced elsewhere. The constraint also sets up an index that includes both ID and Type, making the table relatively efficient to be queried by that combination of columns.
You set up a unique constraint using the "Manage Indexes and Keys" option. Yes, this will create an index and unique constraint for the primary key, and an index and unique constraint for the combination of PK and Type.

I suspect the reason for having both columns in the UNIQUE constraint is related to the error message you mentioned. SQL Server (in common with other SQL DBMSs) has a limitation that a FOREIGN KEY constraint can only reference exactly the set of columns defined by a uniqueness constraint. So if a FOREIGN KEY constraint references two columns then those two columns must have a uniqueness constraint on them - even if other constraints already guarantee uniqueness. This is a pointless limitation but it is part of standard SQL.
The following example is quite similar and explains why a composite foreign key and nested uniqueness constraints can be useful.
http://consultingblogs.emc.com/davidportas/archive/2007/01/08/Distributed-Keys-and-Disjoint-Subtypes.aspx

Here you go:
Cars and trucks have different attributes, so they do not belong in one table. This is why I have two tables, Cars and Trucks.
Yet cars and trucks share some attributes, such as VIN (vehicle idenification number). More to the point, VIN is unique. This is why I need a table Vehicles. A vehicle cannot be both a car and a truck, so I must make sure it is not possible to enter both (VIN=123456789, Type=Car) and (VIN=123456789, Type=Truck). This is why I have a PK on VIN only.
I must ensure that a vehicle cannot have corresponding rows in both Cars and Trucks tables. This is why I have Type column in Cars and Trucks, and this is why I want (VIN, Type) in child tables Cars and Trucks refer to the parent table Vehicles. The only reason why I need an additional unique constraint on (VIN, Type) is this: it is referred by FK constraints from child tables.
BTW, you could leave a comment on the blog - in that case sqlblog would send me a message. It is a coincidence that I noticed your question here; I was supposed to go skiing, only there is no snow.

When should I combine two foreign keys as a single foreign key?

My teach said I should combine two foreign keys into a single primary key. But my thought process is that that would allow for only one combination of each foreign key.
Imagine I have a Product, Purchase, PurchaseDetail.
In PurchaseDetail I have two foreign keys, one for product and one for purchase. My teacher said that I should combine these two foreign keys into a single one. But can't a product be in many different purchases? And many purchases have many products?
I'm confused.
Thanks!
Edit: This is the SQL my teacher saw and then gave feedback upon. Thanks for the guidance guys. (I changed the essential to English)
create table Purchase
(
ID int primary key identity(1,1),
IDCliente int foreign key references Cliente(ID),
IDEmpleado int foreign key references Empleado(ID),
Fecha datetime not null,
Hora datetime not null,
Amount float not null,
)
create table PurchaseDetail
(
ID int primary key identity(1,1),
IDPurchase int foreign key references Purchase(ID),
IDProductOffering int foreign key references ProductOffering(ID),
Quantity int not null
)
create table Product
(
ID int primary key identity(1,1),
IDProveedor int foreign key references Proveedor(ID),
Nombre nvarchar(256) not null,
IDSubcategoria int foreign key references Subcategoria(ID),
IDMarca int foreign key references Marca(ID),
Fotografia nvarchar(1024) not null
)
create table ProductOffering
(
ID int primary key identity(1,1),
IDProduct int foreign key references Product(ID),
Price float not null,
OfferDate datetime not null,
)
Maybe I'm confused about good database schema design. Thanks again!

I imagine he's suggesting:
Product - one primary key (product id), which implies a unique product id
Purchase - one primary key (purchase id), which implies a unique purchase id
PurchaseDetail - two foreign keys (product id),(purchase id), plus one unique constraint on (product id + purchase id)
Plus some people argue that all tables should have their own primary key that doesn't depend on anything else (purchase detail id). Some DBMS make this mandatory.
This means that you can't have two rows in PurchaseDetail that have the same product and purchase. That makes sense, assuming there is also a quantity column on PurchaseDetail, so that one purchase can have more than one of each product.
Note that there is a difference between a unique constraint and a foreign key. A foreign key merely says that there should be an item with that id in the parent table - it will let you create as many references to that item as you want in the child table. You need to specify that the column or combination of columns are unique if you want to avoid duplicates. A primary key on the other hand implies a unique constraint.
Exact syntax for defining all of this varies by language, but those are the principles.

I don't agree with the single key, but they could be a compound key (which I tend to dislike). They can be two different fields each restricted to the ID in the corresponding tables.
Not sure why the same product iD would need to be listed more than once for a single purchase? Isn't that why you indicate quantity? Maybe the need to do a separate line item for a purchase and a discount?

I believe thelem has answered correctly. But there is another option. You could add a new primary key column to the details table, so it looks like this:
detail_id int (PK)
product_id int (FK)
purchsae_id int (FK)
This is not really necessary, but it could be useful if you need to ever need to reference the details table as a foreign key - having a single primary key field makes for smaller indexes and foreign key reference (and they are a little easier to type).

That depends on what data you need to represent.
If you use the two foreign keys as the primary key for the purchase detail, a product may only occur once in each purchase. A purchase may however still contain many products, and a product may still occur in many purchases.
If the purchase detail contains more information, you may need to be able to use a product more than once in a purchase. For example if the purchase detail contains size and color, and you want to by a red T-shirt size XL and a blue T-shirt size S.

Perhaps he is suggesting a many-to-many table where it's Primary Key is comprised of the Foreign Keys to the mapped tables:
PurchaseDetail:
ProductId int (FK)
PurchaseId int (FK)
PK(ProductId, PurchaseId)
This can also be modelled as
PurchaseDetail:
PurchaseDetailId int (PK, Identity)
ProductId int (FK)
PurchaseId int (FK)
The second form is useful if you want to refer to Purchase details elsewhere in your model, and also in some RDBMS's it is beneficial to have a PK on a montonically increasing integer.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight