How can I enforce referential integrity for optionally present uniquekeys?

How can I enforce referential integrity for optionally present uniquekeys? - database

We have multiple offices, and within each office there are multiple departments (some departments have employees in multiple offices). We have two existing systems that identify employees in different ways: in one, employees are identified by IDA; in the other employees are identified by IDB.
In the cases where an employee is identified by an IDA, we need to identify that employee's supervisor, if any, in SUPERVISOR_IDA.
My table looks like this:
IDA
IDB
SUPERVISOR_IDA
OFFICE
DEPARTMENT
Employees could have positions in more than one office, or in more than one department, so the same IDA or IDB could exist more than once, but with a different office, department or both.
The problem is that IDA could be null (and if so, then IDB is not null) or IDB could be null (and if so, then IDA is not null), or both could be present.
I'm trying to set up unique and/or primary keys and constraints to ensure the integrity of the database.
So I created a unique key on (IDA, IDB, OFFICE, DEPARTMENT).
Here's my problem:
I need to ensure that employees reference their supervisors. I wanted to have a self-referencing foreign key so that removal of supervisors would not leave orphan employee records who have a non-null SUPERVISOR_IDA. But since I was required to include IDB in my unique key on this table, if I create this foreign key I am required to include IDB as such:
local PARENT_IDA -> reference IDA
local OFFICE -> reference OFFICE
local DEPARTMENT -> reference DEPARTMENT
**local IDB -> reference IDB**
This is a problem because the employee IDB should NOT be the same as the supervisor IDB.
I know it seems like I'm trying to do too many things in one table perhaps, but in reality my domain is quite difficult to describe and so I created the employee/office/department as an example to illustrate my problems. I really cannot split IDA and IDB into separate tables, as they are intertwined in some problematic ways and the presence of one, the other, or both, has some important meaning that cannot be separated.
At first I wanted to set up a unique key on (IDA, OFFICE, DEPARTMENT) in addition to the aforementioned unique key, but unlike with unique keys that consist of a single column, composite unique keys will treat (null, 'A') and (null, 'A') as duplicates instead of allowing the null column to avoid violation of the uniqueness constraint.

I think the problem is with the model. The table should have a primary key (and if IDA or IDB can be null then they are not PK columns) and the foreign key should reference the PK.
I think you are trying to use an FK against a unique index to enforce a bunch of cross-row validation rules in the data model, such as "an employee can only be supervised by someone in the same office and department" and "an IDA employee can only be supervised by another IDA employee".
In practice those are very hard to enforce when you consider multiple people potentially updating different columns on different rows at the same time.
That said, you could try adding columns DEPT_IDA and OFFICE_IDA and using triggers to set them from DEPT and OFFICE only when IDA is set. Then create the UK on those columns

I think your data model is wrong. You should have only one EMPLOYEE record per employee. Then you can have unique keys on each of IDA and IDB.
Because employees work at multiple offices then you need a table to represent that; POSTS would be an intersection table between OFFICES and EMPLOYEES.
The point being that SUPERVISOR_IDA and SUPERVISOR_IDB are properties of POSTS, and as such you can enforce a foreign key between those columns and the EMPLOYEES table. Use check constraints to ensure that if the POSTS record is identified by an EMPLOYEE_IDA the SUPERVISOR_IDA is populated and ditto for EMPLOYEE_IDB.

I am not sure if this works in Oracle but in SQL Server I would create a trigger that on the SUPERVISORS table that fires on UPDATE and DELETE. The trigger would query the EMPLOYEES Table for any records where SUPERVISORS.SUPERVISOR_IDA = EMPLOYEES.SUPERVISOR_IDA. If any records where found, it would roll back the transaction.
I found this link outlines what you need to do.
http://www.techonthenet.com/oracle/triggers/before_delete.php

Related

Database schema design and foreign keys

I'm confused what a schema should be implemented in a following case:
I have two core tables user and company and a set of 'inherited'(in terms of ORM) tables like user_type_1, user_type2 that extend user table with an own unique set of fields. And the same for company_type_1, company_type_2 tables that extend company table with an own unique set of fields.
Also I have a address table. Every User and Company in a system can have own address.
My initial idea was to add address_id FK to address(id) PK to root user and company tables. This way every User or Company in a system can be associated with own address.
My colleague proposed another database schema design where user_id and company_id FKs have to be added to address table.
I'm really confused with this proposal right now. Could you please explain or this design can take a place ?

If you add address_id to the user and company tables, you're adding functional dependencies user_id -> address_id and company_id -> address_id. This would mean that each user or company can be associated with a single address, but each address can potentially be associated with multiple users or companies (unless you add a constraint to prevent that).
Your colleague's proposal to add user_id and company_id to the address table adds functional dependencies address_id -> user_id and address_id -> company_id meaning each address can be associated with one user, one company or both, but each user or company can potentially be associated with multiple addresses (unless you add a constraint to prevent that).
Between the two options you listed, there's a third - separate tables for the associations, e.g. user_addresses (user_id FK, address_id FK) and company_addresses (company_id FK, address_id FK). This can handle any cardinality (one-to-one/one-to-many/many-to-one/many-to-many) depending on the PK (address_id, user_id/company_id or both) and possible unique constraints.
Which design is correct depends on your requirements.

Multiple foreign keys to the same table in a database?

I have a SQL Server database and it contains a table to record a employee salary.
It has 3 columns declared as foreign keys, and reference to the employee table's column, employee_id:
employee_id
submitted_by
confirmed_by
But is it best practice to make it all as FK, or do I only need employee_id?
Because in my application, submitted_by and confirmed_by will be selected by a drop down list and assume it exist on employee table.
Thanks you for advice.

Yes, since all users of your system are also Employees modelled by your system, if you wish to have Referential Integrity (RI) enforced in the database, all three columns should have foreign keys back to the referenced employee table. Note that since confirmed by sounds like part of a workflow process, where the user confirming may not be available at the time the record is inserted, you can make the field confirmed_by in table EmployeeSalary nullable (confirmed_by INT NULL), in which case RI will only be enforced at the later time when the field is actually populated.
You should name each of the foreign keys appropriately by expressing the role in the foreign key, e.g.
FK_EmployeeSalary_SalariedEmployee
FK_EmployeeSalary_EmployeeSubmittedBy
FK_EmployeeSalary_EmployeeConfirmedBy
Although the front end may restrict choices via the drop down, referential integrity is still beneficial:
Protect against bugs, e.g. where the submitted by employee is omitted (in the case of a non-nullable FK) or the employee provided doesn't exist in the employees table.
Prevent accidental deletion of an employee to which foreign key data is linked.
There is a (very) minor performance penalty on RI whereby the DB will need to check the existence of the PK in the employee table - in most instances this will be negligible.

Any column that references a key in another table should be declared as a foreign key. This way, if you mistakenly try to put a nonexistent value there, the database will report an error.

How to implement this data structure in SQL tables

I have a problem that can be summarized as follow:
Assume that I am implementing an employee database. For each person depends on his position, different fields should be filled. So for example if the employee is a software engineer, I have the following columns:
Name
Family
Language
Technology
CanDevelopWeb
And if the employee is a business manager I have the following columns:
Name
Family
FieldOfExpertise
MaximumContractValue
BonusRate
And if the employee is a salesperson then some other columns and so on.
How can I implement this in database schema?
One way that I thought is to have some related tables:
CoreTable:
Name
Family
Type
And if type is one then the employee is a software developer and hence the remaining information should be in table SoftwareDeveloper:
Language
Technology
CanDevelopWeb
For business Managers I have another table with columns:
FieldOfExpertise
MaximumContractValue
BonusRate
The problem with this structure is that I am not sure how to make relationship between tables, as one table has relationship with several tables on one column.
How to enforce relational integrity?

There are a few schools of thought here.
(1) store nullable columns in a single table and only populate the relevant ones (check constraints can enforce integrity here). Some people don't like this because they are afraid of NULLs.
(2) your multi-table design where each type gets its own table. Tougher to enforce with DRI but probably trivial with application or trigger logic.
The only problem with either of those, is as soon as you add a new property (like CanReadUpsideDown), you have to make schema changes to accommodate for that - in (1) you need to add a new column and a new constraint, in (2) you need to add a new table if that represents a new "type" of employee.
(3) EAV, where you have a single table that stores property name and value pairs. You have less control over data integrity here, but you can certainly constraint the property names to certain strings. I wrote about this here:
What is so bad about EAV, anyway?

You are describing one ("class per table") of the 3 possible strategies for implementing the category (aka. inheritance, generalization, subclass) hierarchy.
The correct "propagation" of PK from the parent to child tables is naturally enforced by straightforward foreign keys between them, but ensuring both presence and the exclusivity of the child rows is another matter. It can be done (as noted in the link above), but the added complexity is probably not worth it and I'd generally recommend handling it at the application level.

I would add a field called EmployeeId in the EmployeeTable
I'd get rid of Type
For BusinessManager table and SoftwareDeveloper for example, I'll add EmployeeId
From here, you can then proceed to create Foreign Keys from BusinessManager, SoftwareDeveloper table to Employee

To further expand on your one way with the core table is to create a surrogate key based off an identity column. This will create a unique employee id for each employee (this will help you distinguish between employees with the same name as well).
The foreign keys preserve your referential integrity. You wouldn't necessarily need EmployeeTypeId as someone else mentioned as you could filter on existence in the SoftwareDeveloper or BusinessManagers tables. The column would instead act as a cached data point for easier querying.
You have to fill in the types in the below sample code and rename the foreign keys.
create table EmployeeType(
EmployeeTypeId
, EmployeeTypeName
, constraint PK_EmployeeType primary key (EmployeeTypeId)
)
create table Employees(
EmployeeId int identity(1,1)
, Name
, Family
, EmployeeTypeId
, constraint PK_Employees primary key (EmployeeId)
, constraint FK_blahblah foreign key (EmployeeTypeId) references EmployeeType(EmployeeTypeId)
)
create table SoftwareDeveloper(
EmployeeId
, Language
, Technology
, CanDevelopWeb
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
create table BusinessManagers(
EmployeeId
, FieldOfExpertise
, MaximumContractValue
, BonusRate
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)

No existing SQL engine has solutions that make life easy on you in this situation.
Your problem is discussed at fairly large in "Practical Issues in Database Management", in the chapter on "entity subtyping". Commendable reading, not only for this particular chapter.
The proper solution, from a logical design perspective, would be similar to yours, but for the "type" column in the core table. You don't need that, since you can derive the 'type' from which non-core table the employee appears in.
What you need to look at is the business rules, aka data constraints, that will ensure the overall integrity (aka consistency) of the data (of course whether any of these actually apply is something your business users, not me, should tell you) :
Each named employee must have exactly one job, and thus some job detail somewhere. iow : (1) no named employees without any job detail whatsoever and (2) no named employees with >1 job detail.
(3) All job details must be for a named employee.
Of these, (3) is the only one you can implement declaratively if you are using an SQL engine. It's just a regular FK from the non-core tables to the core table.
(1) and (2) could be defined declaratively in standard SQL, using either CREATE ASSERTION or a CHECK CONSTRAINT involving references to other tables than the one the CHECK CONSTRAINT is defined on, but neither of those constructs are supported by any SQL engine I know.
One more thing about why [including] the 'type' column is a rather poor choice to make : it changes how constraint (3) must be formulated. For example, you can no longer say "all business managers must be named employees", but instead you'd have to say "all business managers are named employees whose type is <type here>". Iow, the "regular FK" to your core table has now become a reference to a VIEW on your core table, something you might want to declare as, say,
CREATE TABLE BUSMANS ... REFERENCES (SELECT ... FROM CORE WHERE TYPE='BM');
or
CREATE VIEW BM AS (SELECT ... FROM CORE WHERE TYPE='BM');
CREATE TABLE BUSMANS ... REFERENCES BM;
Once again something SQL doesn't allow you to do.

You can use all fields in the same table, but you'll need an extra table named Employee_Type (for example) and here you have to put Developer, Business Manager, ... of course with an unique ID. So your relation will be employee_type_id in Employee table.
Using PHP or ASP you can control what field you want to show depending the employee_type_id (or text) in a drop-down menu.

You are on the right track. You can set up PK/FK relationships from the general person table to each of the specialized tables. You should add a personID to all the tables to use for the relationship as you do not want to set up a relationship on name because it cannot be a PK as it is not unique. Also names change, they are a very poor choice for an FK relationship as a name change could cause many records to need to change. It is important to use separate tables rather than one because some of those things are in a one to many relationship. A Developer for instnce may have many differnt technologies and that sort of thing should NEVER be stored in a comma delimted list.
You could also set up trigger to enforce that records can only be added to a specialty table if the main record has a particular personType. However, be wary of doing this as you wil have peopl who change roles over time. Do you want to lose the history of wha the person knew when he was a developer when he gets promoted to a manager. Then if he decides to step back down to development (A frequent occurance) you would have to recreate his old record.

Designing a conditional database relationship in SQL Server

I have three basic types of entities: People, Businesses, and Assets. Each Asset can be owned by one and only one Person or Business. Each Person and Business can own from 0 to many Assets. What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
My initial plan is to have two nullable foreign keys in the Assets table, one for People and one for Businesses. One of these values will be null, while the other will point to the owner. The problem I see with this setup is that it requires application logic in order to be interpreted and enforced. Is this really the best possible solution or are there other options?

Introducing SuperTypes and SubTypes
I suggest that you use supertypes and subtypes. First, create PartyType and Party tables:
CREATE TABLE dbo.PartyType (
PartyTypeID int NOT NULL identity(1,1) CONSTRAINT PK_PartyType PRIMARY KEY CLUSTERED
Name varchar(32) CONSTRAINT UQ_PartyType_Name UNIQUE
);
INSERT dbo.PartyType VALUES ('Person'), ('Business');
SuperType
CREATE TABLE dbo.Party (
PartyID int identity(1,1) NOT NULL CONSTRAINT PK_Party PRIMARY KEY CLUSTERED,
FullName varchar(64) NOT NULL,
BeginDate smalldatetime, -- DOB for people or creation date for business
PartyTypeID int NOT NULL
CONSTRAINT FK_Party_PartyTypeID FOREIGN KEY REFERENCES dbo.PartyType (PartyTypeID)
);
SubTypes
Then, if there are columns that are unique to a Person, create a Person table with just those:
CREATE TABLE dbo.Person (
PersonPartyID int NOT NULL
CONSTRAINT PK_Person PRIMARY KEY CLUSTERED
CONSTRAINT FK_Person_PersonPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to people
);
And if there are columns that are unique to Businesses, create a Business table with just those:
CREATE TABLE dbo.Business (
BusinessPartyID int NOT NULL
CONSTRAINT PK_Business PRIMARY KEY CLUSTERED
CONSTRAINT FK_Business_BusinessPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to businesses
);
Usage and Notes
Finally, your Asset table will look something like this:
CREATE TABLE dbo.Asset (
AssetID int NOT NULL identity(1,1) CONSTRAINT PK_Asset PRIMARY KEY CLUSTERED,
PartyID int NOT NULL
CONSTRAINT FK_Asset_PartyID FOREIGN KEY REFERENCES dbo.Party (PartyID),
AssetTag varchar(64) CONSTRAINT UQ_Asset_AssetTag UNIQUE
);
The relationship the supertype Party table shares with the subtype tables Business and Person is "one to zero-or-one". Now, while the subtypes generally have no corresponding row in the other table, there is the possibility in this design of having a Party that ends up in both tables. However, you may actually like this: sometimes a person and a business are nearly interchangeable. If not useful, while a trigger to enforce this will be fairly easily done, the best solution is probably to add the PartyTypeID column to the subtype tables, making it part of the PK & FK, and put a CHECK constraint on the PartyTypeID.
The beauty of this model is that when you want to create a column that has a constraint to a business or a person, then you make the constraint to the appropriate table instead of the party table.
Also, if desired, turning on cascade delete on the constraints can be useful, as well as an INSTEAD OF DELETE trigger on the subtype tables that instead delete the corresponding IDs from the supertype table (this guarantees no supertype rows that have no subtype rows present). These queries are very simple and work at the entire-row-exists-or-doesn't-exist level, which in my opinion is a gigantic improvement over any design that requires checking column value consistency.
Also, please notice that in many cases columns that you would think should go in one of the subtype tables really can be combined in the supertype table, such as social security number. Call it TIN (taxpayer identification number) and it works for both businesses and people.
ID Column Naming
The question of whether or not to call the column in the Person table PartyID, PersonID, or PersonPartyID is your own preference, but I think it's best to call these PersonPartyID or BusinessPartyID—tolerating the cost of the longer name, this avoids two types of confusion. E.g., someone unfamiliar with the database sees BusinessID and doesn't know this is a PartyID, or sees PartyID and doesn't know it is restricted by foreign key to just those in the Business table.
If you want to create views for the Party and Business tables, they can even be materialized views since it's a simple inner join, and there you could rename the PersonPartyID column to PersonID if you were truly so inclined (though I wouldn't). If it's of great value to you, you can even make INSTEAD OF INSERT and INSTEAD OF UPDATE triggers on these views to handle the inserts to the two tables for you, making the views appear completely like their own tables to many application programs.
Making Your Proposed Design Work As-Is
Also, I hate to mention it, but if you want to have a constraint in your proposed design that enforces only one column being filled in, here is code for that:
ALTER TABLE dbo.Assets
ADD CONSTRAINT CK_Asset_PersonOrBusiness CHECK (
CASE WHEN PersonID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN BusinessID IS NULL THEN 0 ELSE 1 END = 1
);
However, I don't recommend this solution.
Final Thoughts
A natural third subtype to add is organization, in the sense of something that people and businesses can have membership in. Supertype and subtype also elegantly solve customer/employee, customer/vendor, and other problems similar to the one you presented.
Be careful not to confuse "Is-A" with "Acts-As-A". You can tell a party is a customer by looking in your order table or viewing the order count, and may not need a Customer table at all. Also don't confuse identity with life cycle: a rental car may eventually be sold, but this is a progression in life cycle and should be handled with column data, not table presence--the car doesn't start out as a RentalCar and get turned into a ForSaleCar later, it's a Car the whole time. Or perhaps a RentalItem, maybe the business will rent other things too. You get the idea.
It may not even be necessary to have a PartyType table. The party type can be determined by the presence of a row in the corresponding subtype table. This would also avoid the potential problem of the PartyTypeID not matching the subtype table presence. One possible implementation is to keep the PartyType table, but remove PartyTypeID from the Party table, then in a view on the Party table return the correct PartyTypeID based on which subtype table has the corresponding row. This won't work if you choose to allow parties to be both subtypes. Then you would just stick with the subtype views and know that the same value of BusinessID and PersonID refer to the same party.
Further Reading On This Pattern
Please see A Universal Person and Organization Data Model for a more complete and theoretical treatment.
I recently found the following articles to be useful for describing some alternate approaches for modeling inheritance in a database. Though specific to Microsoft's Entity Framework ORM tool, there's no reason you couldn't implement these yourself in any DB development:
Table Per Hierarchy
Table Per Type (this is what I advocate above as the only fully normalized method of implementing inheritance in a database)
Table Per Concrete Class
Or a more brief overview of these three ways: How to choose an inheritance strategy
P.S. I have switched, more than once, my opinion on column naming of IDs in subtype tables, due to having more experience under my belt.

You don't need application logic to enforce this. The easiest way is with a check constraint:
(PeopleID is null and BusinessID is not null) or (PeopleID is not null and BusinessID is null)

You can have another entity from which Person and Business "extend". We call this entity Party in our current project. Both Person and Business have a FK to Party (is-a relationship). And Asset may have also a FK to Party (belongs to relationship).
With that said, if in the future an Asset can be shared by multiple instances, is better to create m:n relationships, it gives flexibility but complicates the application logic and the queries a bit more.

ErikE's answer gives a good explanation on how to go about the supertype / subtype relationship in tables and is likely what I'd go for in your situation, however, it doesn't really address the question(s) you've posed which are also interesting, namely:
What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
...are there other options?
For those I recommend this blog entry on TechTarget which has an excerpt from excerpt from "A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008" by Eric Johnson and Joshua Jones which addresses 3 possible options.
In summary they are:
Supertype Table - Almost matches what you've proposed, have a table with some fields that will always be null when others are filled. Good when only a couple of fields aren't shared. So depending on how different Business and People are you could possibly combine them into one table, Owners perhaps, and then just have OwnerID in your Asset table.
Subtype Tables - Basically the opposite of what Supertype tables are and is what you have just now. Here we have lots of unique fields and maybe one or two the same so we just have the repeated fields appear in each table. As you are finding this isn't really suitable for your situation.
Supertype and Subtype Tables - A combination of both of the above where the matching fields are placed in a single table and the unique ones in separate tables and matching IDs are used to join the record from one table to the other. This matches ErikE's proposed solution and, as mentioned, is the one I would favour as well.
Sadly it doesn't go on to explain which, if any, are best practice but it is certainly a good read to get an idea of the options that are out there.

YOu can enforce the logic with a trigger instead. Then no matter how the record is changed, only one of the fileds will be filled in.
You could also have a PeopleAsset table and a BusinessAsset table, but stillwould have the problem of enforcing that only one of them has a record.

An asset would have a foreign key to the owning person, and you should setup an association table to link assets and businesses. As said in other comments, you can use triggers and/or constraints to ensure that the data stays in a consistent state. ie. when you delete a business, delete the lines in your association table.

Table People, Businesses both can use UUID as primary key, and union both to a view for sql join purpose.
so you can simply use one foreign key column in Assets relation to both People and Businesses, because UUID is nearly unique. And you can simply query like：
select * from Assets
join view_People_Businesses as v on v.id = Assets.fk

Do I need to define a new primary key field for each table?

I have a few database tables that really only require a unique id that references another table e.g.
Customer Holiday
******** *******
ID (PK) ---> CustomerID (PK)
Forename From
Surname To
....
These tables such as Holiday, only really exist to hold information regarding a Customer. Therefore, do I need to specify a separate field to hold the ID for the holiday? i.e.
Holiday
*******
ID (PK)
CustomerID (FK)
...
Or would I be ok, in this instance, to just set the CustomerID as the primary key in the table?
Regards,
James.

This really depends on what you are doing.
if each customer can have only 1 holiday, then yes, you could make the customerid the primary key.
If each customer can have multiple holidays, then no, you would want to add a new id column, make it the primary. This allows you to select holidays by each customer AND to select individual records by their unique id.
Additionally if each customer can only have 1 holiday, I'd just add the holiday information to the table, as a one-to-one relationship is typically un-necessary.

If I understand your question correctly, you could only use the Customer table as a primary key in Holiday if there will never be any other holiday for that customer in the table. In other words, two holidays for one customer breaks using the Customer id as a primary key.

If there will ever be an object-oriented program associated with this database, each entity (each row) must have a unique key.
Your second design assures that each instance of Holiday can be uniquely identified and processed by an OO application using a simple Object-Relational Mapping.
Generally, it's best to assure that every entity in the database has a unique, immutable, system-assigned ("surrogate") key. Other "natural" keys can have unique indexes, constraints, etc., to fit the business logic.

Previous answer correct, but also remember, you could have 2 seperate primary keys in each table, and the "holiday" table would have the foreign key to CustomerId.
Then you could manage the assignment of holidays to customers in your code, to make sure that only one holiday can be assigned to a customer, but this brings in the problem concurrency, being 2 people adding a holiday to a customer at the same time will most probably result in a customer having 2 holidays.
You could even place holiday fields in the customer table if a customer can only be created with a holiday, but this design is messy, and not really advised
So once again, option in your question 2 still the best way to go, just giving you your options.

In practice I've found that every table should have a unique primary key identifying the records in those tables. All relationships with other tables should be explicitly declared.
This helps others understand the relationships better, especially if they use a tool to reverse-engineer the schema into a visual representation.
In addition, it gives you more flexibility to expand your solution in the future. You may only have one holiday per customer now, but this is much more difficult to change if you make customer ID the primary key.
If you want to mandate the uniqueness of customer in the holiday table, create a unique index on that foreign key. In fact, this could improve performance when querying on customer ID (although I'm guessing you won't see enough records to notice this improvement).