Database design SQL Server

Database design SQL Server - sql-server

Say I have a database with multiple entitles like person, company, conference for which you have to keep track of say addresses. We can have multiple addresses for the same entity (person). One approach is to have a separate address table for each entity (person_address etc). Another approach is to have an address table which has primary key (Entity,id,address_type). In this approach we cannot use foreign keys from address table to entities .
So what is the better approach. Is there another way to do this ?
thanks

At a logical modeling POV your descriptions highlights the fact that the entities like person, company, conference etc have a common trait: they have zero, one or more addresses. If you would model this as a class hierarchy, perhaps you would create an Addressable class and have person, company and conference inherit from this Addressable class. You can apply the same reasoning to your data model and have an addresable table with an addressable_entity_id. The person, company, conference entities would 'inherit' this table. There are three established ways to implement table inheritance:
Class Table Inheritance
Single Table Inheritance
Concrete Table Inheritance
So you could model your tables like this:
create table Addresses (AddressId int not null identity(1,1) primary key, ...);
create table Addressable (AddressableId int not null identity (1,1) primary key, ...);
create table AddressableAddress (
AddressId int not null,
AddressableId int not null,
constraint AddressableAddressAddressId
foreign key (AddressId) references Addresses(AddressId),
constraint AddressableAddressAddressableId
foreign key (AddressableId) references Addressable(AddressableId));
create table Person (PersonId int not null identity(1,1) primary key,
AddressableId int not null,
...,
constraint PersonAddressableAddressableId
foreign key AddressableId references Addressable (AddressableId));
create table Company (CompanyId int not null identity(1,1) primary key,
AddressableId int not null,
...,
constraint CompanyAddressableAddressableId
foreign key AddressableId references Addressable (AddressableId));
Of course you have to find the right balance between absolute relational normal form and actual usability. In this scheme I propose for instance in order to insert a new Person one has to first a row in Addressable, get the AddressableId and then proceed and insert the person. This may or may nor work. BTW, there is a way to do such an insert in one single statement using the OUTPUT clause to chain two inserts:
insert into Addressable (AddressableType)
output inserted.AddressableId, #FirstName, #LastName
into Person (AddressableId, FirstName, LastName)
values (AddressableTypePerson);
But now is difficult to retrieve the newly inserted PersonId.

Technically if two people live at the same address you would not be completely normalized if there was simply a single one-to-many detail table for the row in TBLPerson called TBLAddress However, if you want just one instance per physical address you will incur the overhead of a many-to-many relation table of TBLPersonAddresses which FK's to TBLAddress
I would say that unless you expect multiple people at the same address to be the norm that I would simply have the TBLAddress with column personID as a detail to the TBLPerson
EDIT: And I tend to always use surrogate keys unless I have a specific reason not to do so.

Related

How to ensure uniqueness in many-to-many relationship table?

Users have many roles, roles have many users.
In USERS_ROLES table, have 3 columns: USERS_ROLES_ID, USER_ID, ROLE_ID
Usually USERS_ROLES_ID is just sequentially generated. Someone told me it's supposed to guarantee that user_id and role_id cross product are unique, so the primary key USERS_ROLES_ID should actually be some sort of combination of both USER_ID and ROLE_ID. How is this done, usually? (for example, USER_ID * (big number here) + ROLE_ID)?? Every example I could find uses a naive sequential primary key generation of the many-to-many join table.

Having a sequentially generated USERS_ROLE_ID primary key will not guarantee a unique combination of USER_ID and ROLE_ID. Adding a unique index on (USER_ID, ROLE_ID) will.

Gerrat is right. I found the full answer here: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
Create table CustomerProducts
(
Customer_ProductID int identity primary key,
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null
)
This is what I see in perhaps most of the databases that I’ve worked
with over the years. The reason for designing a table in this manner?
Honestly, I don’t know! I can only surmise that it is because of the
lack of understanding what a primary key of a table really is, and
that it can be something other than an identity and that it can be
comprised of more than just a single column. As I mentioned, it seems
that many database architects are simply not aware of this fact.
Consider instead the following design:
Create table CustomerProducts (
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null,
Primary key (CustomerID, ProductID) )
Notice here that we have eliminated the identity column, and have
instead defined a composite (multi-column) primary key as the
combination of the CustomerID and ProductID columns. Therefore, we do
not have to create an additional unique constraint. We also do not
need an additional identity column that really serves no purpose. We
have not only simplified our data model physically, but we’ve also
made it more logically sound and the primary key of this table
accurately explains what it is this table is modeling – the
relationship of a CustomerID to a ProductID.

composite vs surrogate primary key

I am designing a database with the following requirements:
An organization can exist on its own
An organization can have any number of distinct terms (date range)
An organization can have any number of survey types (student, teacher, parent, etc)
A survey form is assigned a term and survey type
A structure for this might be:
Organization
- OrganizationId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
Term
- TermId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyType
- SurveyTypeId IDENTITY(1,1) NOT NULL PRIMARY KEY
- OrganizationId INT NOT NULL REFERENCES Organization(OrganizationId)
SurveyForm
- SurveyFormId INT IDENTITY(1,1) NOT NULL PRIMARY KEY
- SurveyTypeId INT NOT NULL REFERENCES SurveyType(SurveyTypeId)
- TermId INT NOT NULL REFERENCES Term(TermId)
That structure keeps with what seems to be a popular emphasis on a single surrogate primary key. However that structure sacrifices data integrity because it is very easy for a SurveyForm record to have a TermId or SurveyTypeId from different Organizations.
To address data integrity, it would seem you would have to add OrganizationId and use it in the composite keys (OrganizationId, SurveyTypeId) and (OrganizationId, TermId). That is somewhat tolerable in this example but as the schema becomes more complete, the composite key sizes increase.
So my question is, how do people generally approach this now (most references online are from 2008 when I think its possible there were different database design concerns)? As a corollary, when is it acceptable to add foreign keys to a table to reduce the number of tables joined for common expressions?

Academically speaking, you can migrate the Organization key along both lineages. That's just 4 bytes, after all:
create table dbo.Organization (
OrganizationId INT IDENTITY(1,1) PRIMARY KEY
);
go
create table dbo.Term (
TermId INT IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, TermId)
);
go
create table dbo.SurveyType (
SurveyTypeId int IDENTITY(1,1) NOT NULL,
OrganizationId INT NOT NULL REFERENCES dbo.Organization(OrganizationId),
primary key (OrganizationId, SurveyTypeId)
);
go
create table dbo.SurveyForm (
SurveyFormId INT IDENTITY(1,1) NOT NULL,
OrganizationId int not null,
SurveyTypeId INT NOT NULL,
TermId INT NOT NULL,
primary key (OrganizationId, SurveyTypeId, TermId),
foreign key (OrganizationId, TermId) references dbo.Term (OrganizationId, TermId),
foreign key (OrganizationId, SurveyTypeId) references dbo.SurveyType (OrganizationId, SurveyTypeId)
);
go
These tables definitely violate some NF, I don't remember which one exactly, but I'm sure you can handle it yourself.
While this design approach can almost be considered a must for a warehouse (esp. if you aggregate data from different sources), I would never recommend it for any real-life OLTP. Much simpler solution would be:
Perform all modifications via a stored procedure, which will have proper checks against this kind of possible discrepancy.
Make sure that no user would have permissions to directly add / modify data in the dbo.SurveyForm, thus circumventing the business rules implemented in the aforementioned SP.

I think there could be a way to avoid circular references, firstly by defining who really depends on who and removing redundant dependencies.
The question is... are Organizations allowed to be randomly associated to Terms without caring about any Survey association? I wonder if Organizations really need to be associated to a Term directly or indirectly through Surveys. If, for example, an Organization CANNOT be associated to a Term that is not associated to the Organization's Survey then the Organization-Term relationship is useless, if it is the other way around, then the Organization-SurveyType is not needed

How to manage uniqueness of a value that is stored in multiple databases tables

I have separate assets tables for storing different kind of physical and logical assets, such as:-
Vehicle table( ID, model, EngineSize, Drivername, lastMaintenanceDate)
Server table ( ID, IP, OSName, etc…)
VM (ID, Size, etc…).
VM_IP (VM_ID,IP)
Now the problems I have is:-
For the IP column in the server table and in the VM_IP table, I need this column to be unique in these two tables, so for example the database should not allow a server and a VM to have the same IP. In the current design I can only guarantee uniqueness for the table separately.
So can anyone advice on how I can handle this unique requirement on the databases level.
Regards
::EDITED::
I have currently the following database structure:-
Currently I see these points:-
I have introduced a redundant AssetTypeID column in the base Asset table, so I can know the asset type without having to join tables. This might break normalization.
In my above architecture , I cannot control (on the database level) which asset should have IP, which asset should not have IP and which asset can/cannot have multiple IPs.
So is there a way to improve my architecture to handle these two points.
Thanks in advance for any help.

Create an IP table and use foreign keys

If I were facing the problem in design level, I would add two more tables:
A valid_IP table (containing valid IP range)
A Network_Enabeled, base table for all entities that may have an
IP, like Server table, VM_IP ,... the primary key of this base
table will be the primary key of child tables.
In Network_Enabeled table, Having a foreign key from valid_IP table and setting a unique key on the filed will be the answer.
Hope be helpful.

You can use an indexed view.
CREATE VIEW YourViewName with SCHEMABINDING
as
...
GO
CREATE UNIQUE CLUSTERED INDEX IX_YourIndexName
on YourViewName (..., ...)

Based on your edit, you can introduce a superkey on the asset table and use various constraints to enforce most of what it sounds like you're looking for:
create table Asset (
AssetID int not null primary key,
AssetTypeID int not null
--Skip all of the rest, foreign keys, etc, irrelevant to example
,constraint UQ_Asset_TypeCheck
UNIQUE (AssetID,AssetTypeID) --This is the superkey
)
The above means that the AssetTypeID column can now be checked/enforced in other tables, and there's no risk of inconsistency
create table Servers (
AssetID int not null primary key,
AssetTypeID as 1 persisted,
constraint FK_Servers_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Strictly, this will become redundant
constraint FK_Servers_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
So, in the above, we enforce that all entries in this table must actually be of the correct asset type, by making it a fixed computed column that is then used in a foreign key back to the superkey.
--So on for other asset types
create table Asset_IP (
AssetID int not null,
IPAddress int not null primary key, --Wrong type, for IPv6
AssetTypeID int not null,
constraint FK_Asset_IP_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Again, redundant
constraint CK_Asset_Types CHECK (
AssetTypeID in (1/*, Other types allowed IPs */)),
constraint FK_Asset_IP_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
And now, above, we again reference the superkey to ensure that we've got a local (to this table) correct AssetTypeID value, which we can then use in a check constraint to limit which asset types are actually allowed entries in this table.
create unique index UQ_Asset_SingleIPs on Asset_IP (AssetID)
where AssetTypeID in (1/* Type IDs that are only allowed 1 IP address */)
And finally, for certain AssetTypeID values, we ensure that this table only contains one row for that AssetID.
I hope that gives you enough ideas of how to implement your various checks based on types. If you want/need to, you can now construct some views (through which the rest of your code will interact) which hides the extra columns and provides triggers to ease INSERT statements.
On a side note, I'd recommend picking a convention and sticking to it when it comes to table naming. My preferred one is to use the plural/collective name, unless the table is only intended to contain one row. So I'd rename Asset as Assets, for example, or Asset_IP as Asset_IPs. At the moment, you have a mixture.

Database design - composite key relationship issue

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!

If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)

Bidirectional foreign key constraint

I'm thinking of designing a database schema similar to the following:
Person (
PersonID int primary key,
PrimaryAddressID int not null,
...
)
Address (
AddressID int primary key,
PersonID int not null,
...
)
Person.PrimaryAddressID and Address.PersonID would be foreign keys to the corresponding tables.
The obvious problem is that it's impossible to insert anything into either table. Is there any way to design a working schema that enforces every Person having a primary address?

"I believe this is impossible. You cannot create an Address Record until you know the ID of the person and you cannot insert the person record until you know an AddressId for the PrimaryAddressId field."
On the face of it, that claim seems SO appealing. However, it is quite propostrous.
This is a very common kind of problem that the SQL DBMS vendors have been trying to attack for perhaps decades already.
The key is that all constraint checking must be "deferred" until both inserts are done. That can be achieved under different forms. Database transactions may offer the possibility to do something like "SET deferred constraint checking ON", and you're done (were it not for the fact that in this particular example, you'd likely have to mess very hard with your design in order to be able to just DEFINE the two FK constraints, because one of them simply ISN'T a 'true' FK in the SQL sense !).
Trigger-based solutions as described here achieve essentially the same effect, but those are exposed to all the maintenance problems that exist with application-enforced integrity.
In their work, Chris Date & Hugh Darwen describe what is imo the true solution to the problem : multiple assignment. That is, essentially, the possibility to compose several distinct update statements and have the DBMS act upon it as if that were one single statement. Implementations of that concept do exist, but you won't find any that talks SQL.

This is a perfect example of many-to-many relationship. To resolve that you should have intermediate PERSON_ADDRESS table. In other words;
PERSON table
person_id (PK)
ADDRESS table
address_id (PK)
PERSON_ADDRESS
person_id (FK) <= PERSON
address_id (FK) <= ADDRESS
is_primary (BOOLEAN - Y/N)
This way you can assign multiple addresses to a PERSON and also reuse ADDRESS records in multiple PERSONs (for family members, employees of the same company etc.). Using is_primary field in PERSON_ADDRESS table, you can identify if that person_addrees combination is a primary address for a person.

We mark the primary address in our address table and then have triggers that enforces only record per person can have it (but one record must have it). If you change the primary address, it will update the old primary address as well as the new one. If you delete a primary address and other addresses exist, it will promote one of them (basesd ona series of rules) to the primary address. If the address is inserted and is the first address inserted, it will mark that one automatically as the primary address.

The second FK (PersonId from Address to Person) is too restrictive, IMHO. Are you suggesting that one address can only have a single person?

From your design, it seems that an address can apply to only one person, so just use the PersonID as the key to the address table, and drop the AddressID key field.

I know I'll probably be crucified or whatever, but here goes...
I've done it like this for my "particular very own unique and non-standard" business need ( =( God I'm starting to sound like SQL DDL even when I speak).
Here's an exaxmple:
CREATE TABLE IF NOT EXISTS PERSON(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
ADDRESS_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT PERSON_UQ UNIQUE KEY (ADDRESS_ID, ...));
INSERT INTO PERSON(ID, DESCRIPTION)
VALUES (1, 'GOVERNMENT');
CREATE TABLE IF NOT EXISTS ADDRESS(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
PERSON_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT ADDRESS_UQ UNIQUE KEY (PERSON_ID, ...),
CONSTRAINT ADDRESS_PERSON_FK FOREIGN KEY (PERSON_ID) REFERENCES PERSON(ID));
INSERT INTO ADDRESS(ID, DESCRIPTION)
VALUES (1, 'ABANDONED HOUSE AT THIS ADDRESS');
ALTER TABLE PERSON ADD CONSTRAINT PERSON_ADDRESS_FK FOREIGN KEY (ADDRESS_ID) REFERENCES ADDRESS(ID);
<...life goes on... whether you provide and address or not to the person and vice versa>
I defined one table, then the other table referencing the first and then altered the first to reflect the reference to the second (which didn't exist at the time of the first table's creation). It's not meant for a particular database; if I need it I just try it and if it works then I use it, if not then I try to avoid having that need in the design (I can't always control that, sometimes the design is handed to me as-is). if you have an address without a person then it belongs to the "government" person. If you have a "homeless person" then it gets the "abandoned house" address. I run a process to determine which houses have no users